Namespace Lucene.Net.Misc
Miscellaneous index tools.
Classes
GetTermInfo
Utility to get document frequency and total number of occurrences (sum of the tf for each doc) of a term.
HighFreqTerms
HighFreqTerms class extracts the top n most frequent terms (by document frequency) from an existing Lucene index and reports their document frequency.
If the -t flag is given, both document frequency and total tf (total number of occurrences) are reported, ordered by descending total tf.
HighFreqTerms.DocFreqComparer
Compares terms by DocFreq
HighFreqTerms.TotalTermFreqComparer
Compares terms by TotalTermFreq
IndexMergeTool
Merges indices specified on the command line into the index specified as the first command line argument.
SweetSpotSimilarity
A similarity with a lengthNorm that provides for a "plateau" of equally good lengths, and tf helper functions.
For lengthNorm, A min/max can be specified to define the plateau of lengths that should all have a norm of 1.0. Below the min, and above the max the lengthNorm drops off in a sqrt function.
For tf, baselineTf and hyperbolicTf functions are provided, which subclasses can choose between.
TermStats
Holder for a term along with its statistics (DocFreq and TotalTermFreq).