Fork me on GitHub
  • API

    Show / Hide Table of Contents

    Namespace Lucene.Net.Misc

    Miscellaneous index tools.

    Classes

    GetTermInfo

    Utility to get document frequency and total number of occurrences (sum of the tf for each doc) of a term.

    HighFreqTerms

    HighFreqTerms class extracts the top n most frequent terms (by document frequency) from an existing Lucene index and reports their document frequency.

    If the -t flag is given, both document frequency and total tf (total number of occurrences) are reported, ordered by descending total tf.

    HighFreqTerms.DocFreqComparer

    Compares terms by DocFreq

    HighFreqTerms.TotalTermFreqComparer

    Compares terms by TotalTermFreq

    IndexMergeTool

    Merges indices specified on the command line into the index specified as the first command line argument.

    SweetSpotSimilarity

    A similarity with a lengthNorm that provides for a "plateau" of equally good lengths, and tf helper functions.

    For lengthNorm, A min/max can be specified to define the plateau of lengths that should all have a norm of 1.0. Below the min, and above the max the lengthNorm drops off in a sqrt function.

    For tf, baselineTf and hyperbolicTf functions are provided, which subclasses can choose between.

    TermStats

    Holder for a term along with its statistics (DocFreq and TotalTermFreq).

    • Improve this Doc
    Back to top Copyright © 2022 The Apache Software Foundation, Licensed under the Apache License, Version 2.0
    Apache Lucene.Net, Lucene.Net, Apache, the Apache feather logo, and the Apache Lucene.Net project logo are trademarks of The Apache Software Foundation.
    All other marks mentioned may be trademarks or registered trademarks of their respective owners.