Namespace Lucene.Net.Search.Suggest.Analyzing

Analyzer based autosuggest.

Classes

AnalyzingInfixSuggester

Analyzes the input text and then suggests matches based on prefix matches to any tokens in the indexed text. This also highlights the tokens that match.

This suggester supports payloads. Matches are sorted only by the suggest weight; it would be nice to support blended score + weight sort in the future. This means this suggester best applies when there is a strong a-priori ranking of all the suggestions.

This suggester supports contexts, however the contexts must be valid utf8 (arbitrary binary terms will not work).

Note

This API is experimental and might change in incompatible ways in the next release.

AnalyzingInfixSuggesterIndexWriterConfigFactory

Default Lucene.Net.Index.IndexWriterConfig factory for AnalyzingInfixSuggester.

AnalyzingSuggester

Suggester that first analyzes the surface form, adds the analyzed form to a weighted FST, and then does the same thing at lookup time. This means lookup is based on the analyzed form while suggestions are still the surface form(s).

This can result in powerful suggester functionality. For example, if you use an analyzer removing stop words, then the partial text "ghost chr..." could see the suggestion "The Ghost of Christmas Past". Note that position increments MUST NOT be preserved for this example to work, so you should call the constructor with preservePositionIncrements parameter set to false

If SynonymFilter is used to map wifi and wireless network to hotspot then the partial text "wirele..." could suggest "wifi router". Token normalization like stemmers, accent removal, etc., would allow suggestions to ignore such variations.

When two matching suggestions have the same weight, they are tie-broken by the analyzed form. If their analyzed form is the same then the order is undefined.

There are some limitations:

A lookup from a query like "net" in English won't be any different than "net " (ie, user added a trailing space) because analyzers don't reflect when they've seen a token separator and when they haven't.
If you're using Lucene.Net.Analysis.Core.StopFilter, and the user will type "fast apple", but so far all they've typed is "fast a", again because the analyzer doesn't convey whether it's seen a token separator after the "a", Lucene.Net.Analysis.Core.StopFilter will remove that "a" causing far more matches than you'd expect.
Lookups with the empty string return no results instead of all results.

Note

This API is experimental and might change in incompatible ways in the next release.

BlendedInfixSuggester

Extension of the AnalyzingInfixSuggester which transforms the weight after search to take into account the position of the searched term into the indexed text. Please note that it increases the number of elements searched and applies the ponderation after. It might be costly for long suggestions.

Note

This API is experimental and might change in incompatible ways in the next release.

FSTUtil

Exposes a utility method to enumerate all paths intersecting an Lucene.Net.Util.Automaton.Automaton with an Lucene.Net.Util.Fst.FST.

FSTUtil.Path<T>

Holds a pair (automaton, fst) of states and accumulated output in the intersected machine.

FreeTextSuggester

Builds an ngram model from the text sent to Build(IInputEnumerator, double) and predicts based on the last grams-1 tokens in the request sent to DoLookup(string, IEnumerable<BytesRef>, bool, int). This tries to handle the "long tail" of suggestions for when the incoming query is a never before seen query string.

Likely this suggester would only be used as a fallback, when the primary suggester fails to find any suggestions.

Note that the weight for each suggestion is unused, and the suggestions are the analyzed forms (so your analysis process should normally be very "light").

This uses the stupid backoff language model to smooth scores across ngram models; see "Large language models in machine translation" for details.

From DoLookup(string, IEnumerable<BytesRef>, bool, int), the key of each result is the ngram token; the value is MaxValue * score (fixed point, cast to long). Divide by MaxValue to get the score back, which ranges from 0.0 to 1.0.

onlyMorePopular is unused.

Note

This API is experimental and might change in incompatible ways in the next release.

FuzzySuggester

Implements a fuzzy AnalyzingSuggester. The similarity measurement is based on the Damerau-Levenshtein (optimal string alignment) algorithm, though you can explicitly choose classic Levenshtein by passing false for the transpositions parameter.

At most, this query will match terms up to Lucene.Net.Util.Automaton.LevenshteinAutomata.MAXIMUM_SUPPORTED_DISTANCE edits. Higher distances are not supported. Note that the fuzzy distance is measured in "byte space" on the bytes returned by the Lucene.Net.Analysis.TokenStream's Lucene.Net.Analysis.TokenAttributes.ITermToBytesRefAttribute, usually UTF8. By default the analyzed bytes must be at least 3 DEFAULT_MIN_FUZZY_LENGTH bytes before any edits are considered. Furthermore, the first 1 DEFAULT_NON_FUZZY_PREFIX byte is not allowed to be edited. We allow up to 1 DEFAULT_MAX_EDITS edit. If unicodeAware parameter in the constructor is set to true, maxEdits, minFuzzyLength, transpositions and nonFuzzyPrefix are measured in Unicode code points (actual letters) instead of bytes.

NOTE: This suggester does not boost suggestions that required no edits over suggestions that did require edits. This is a known limitation.

Note: complex query analyzers can have a significant impact on the lookup performance. It's recommended to not use analyzers that drop or inject terms like synonyms to keep the complexity of the prefix intersection low for good lookup performance. At index time, complex analyzers can safely be used.

Note

This API is experimental and might change in incompatible ways in the next release.

SuggestStopFilter

Like Lucene.Net.Analysis.Core.StopFilter except it will not remove the last token if that token was not followed by some token separator. For example, a query 'find the' would preserve the 'the' since it was not followed by a space or punctuation or something, and mark it KEYWORD so future stemmers won't touch it either while a query like "find the popsicle' would remove 'the' as a stopword.

Normally you'd use the ordinary Lucene.Net.Analysis.Core.StopFilter in your indexAnalyzer and then this class in your queryAnalyzer, when using one of the analyzing suggesters.

Interfaces

IAnalyzingInfixSuggesterIndexWriterConfigFactory

Generic interface that can be used to customize the index writer to be used by AnalyzingInfixSuggester.

This class is specific to Lucene.NET, where factory classes are used to allow customization as opposed to making virtual method calls from the constructor

Enums

BlendedInfixSuggester.BlenderType

The different types of blender.

SuggesterOptions

LUCENENET specific type for specifying AnalyzingSuggester and FuzzySuggester options.