Namespace Lucene.Net.Sandbox.Queries
Additional queries (some may have caveats or limitations)
Classes
DuplicateFilter
Filter to remove duplicate values from search results.
WARNING: for this to work correctly, you may have to wrap your reader as it cannot current deduplicate across different index segments.
FuzzyLikeThisQuery
Fuzzifies ALL terms provided as strings and then picks the best n differentiating terms.
In effect this mixes the behaviour of Fuzzy
For each source term the fuzzy variants are held in a Boolean
SlowFuzzyQuery
Implements the classic fuzzy search query. The similarity measurement is based on the Levenshtein (edit distance) algorithm.
Note that, unlike Fuzzy
SlowFuzzyTermsEnum
Potentially slow fuzzy Terms
If the minSimilarity or maxEdits is greater than the Automaton's
allowable range, this backs off to the classic (brute force)
fuzzy terms enum method by calling Get
Term enumerations are always ordered by Comparer. Each term in the enumeration is greater than all that precede it.
SortedSetSortField
SortField for Sorted
A Sorted
By default, the minimum value in the set is selected as the sort value, but this can be customized. Selectors other than the default do have some limitations (see below) to ensure that all selections happen in constant-time for performance.
Like sorting by string, this also supports sorting missing values as first or last,
via Missing
Limitations:
-
Fields containing System.
Int32. or more unique values are unsupported.Max Value - Selectors other than the default MIN require optional codec support. However several codecs provided by Lucene, including the current default codec, support this.
Enums
KeepMode
KeepMode determines which document id to consider as the master, all others being identified as duplicates. Selecting the "first occurrence" can potentially save on IO.
ProcessingMode
"Full" processing mode starts by setting all bits to false and only setting bits for documents that contain the given field and are identified as none-duplicates.
"Fast" processing sets all bits to true then unsets all duplicate docs found for the given field. This approach avoids the need to read DocsEnum for terms that are seen to have a document frequency of exactly "1" (i.e. no duplicates). While a potentially faster approach , the downside is that bitsets produced will include bits set for documents that do not actually contain the field given.
Selector
Selects a value from the document's set to use as the sort value