Fork me on GitHub
  • API

    Show / Hide Table of Contents

    Namespace Lucene.Net.Misc

    Misc Tools

    The misc package has various tools for splitting/merging indices, changing norms, finding high freq terms, and others.

    Classes

    GetTermInfo

    Utility to get document frequency and total number of occurrences (sum of the tf for each doc) of a term.

    LUCENENET specific: In the Java implementation, this class' Main method was intended to be called from the command line. However, in .NET a method within a DLL can't be directly called from the command line so we provide a .NET tool, lucene-cli, with a command that maps to that method: index list-term-info

    HighFreqTerms

    HighFreqTerms class extracts the top n most frequent terms (by document frequency) from an existing Lucene index and reports their document frequency.

    LUCENENET specific: In the Java implementation, this class' Main method was intended to be called from the command line. However, in .NET a method within a DLL can't be directly called from the command line so we provide a .NET tool, lucene-cli, with a command that maps to that method: index list-high-freq-terms

    HighFreqTerms.DocFreqComparer

    Compares terms by DocFreq

    HighFreqTerms.TotalTermFreqComparer

    Compares terms by TotalTermFreq

    IndexMergeTool

    Merges indices specified on the command line into the index specified as the first command line argument.

    LUCENENET specific: In the Java implementation, this class' Main method was intended to be called from the command line. However, in .NET a method within a DLL can't be directly called from the command line so we provide a .NET tool, lucene-cli, with a command that maps to that method: index merge

    SweetSpotSimilarity

    A similarity with a lengthNorm that provides for a "plateau" of equally good lengths, and tf helper functions.

    For lengthNorm, A min/max can be specified to define the plateau of lengths that should all have a norm of 1.0. Below the min, and above the max the lengthNorm drops off in a sqrt function.

    For tf, baselineTf and hyperbolicTf functions are provided, which subclasses can choose between.

    TermStats

    Holder for a term along with its statistics (DocFreq and TotalTermFreq).

    Back to top Copyright © 2024 The Apache Software Foundation, Licensed under the Apache License, Version 2.0
    Apache Lucene.Net, Lucene.Net, Apache, the Apache feather logo, and the Apache Lucene.Net project logo are trademarks of The Apache Software Foundation.
    All other marks mentioned may be trademarks or registered trademarks of their respective owners.