Class FreeTextSuggester
Builds an ngram model from the text sent to Build(IInputEnumerator, double) and predicts based on the last grams-1 tokens in the request sent to DoLookup(string, IEnumerable<BytesRef>, bool, int). This tries to handle the "long tail" of suggestions for when the incoming query is a never before seen query string.
Likely this suggester would only be used as a fallback, when the primary suggester fails to find any suggestions.
Note that the weight for each suggestion is unused, and the suggestions are the analyzed forms (so your analysis process should normally be very "light").
This uses the stupid backoff language model to smooth scores across ngram models; see "Large language models in machine translation" for details.
From DoLookup(string, IEnumerable<BytesRef>, bool, int), the key of each result is the ngram token; the value is MaxValue * score (fixed point, cast to long). Divide by MaxValue to get the score back, which ranges from 0.0 to 1.0.
onlyMorePopular is unused.
Note
This API is experimental and might change in incompatible ways in the next release.
Inherited Members
Namespace: Lucene.Net.Search.Suggest.Analyzing
Assembly: Lucene.Net.Suggest.dll
Syntax
public class FreeTextSuggester : Lookup
Constructors
FreeTextSuggester(Analyzer)
Instantiate, using the provided analyzer for both indexing and lookup, using bigram model by default.
Declaration
public FreeTextSuggester(Analyzer analyzer)
Parameters
| Type | Name | Description |
|---|---|---|
| Analyzer | analyzer |
FreeTextSuggester(Analyzer, Analyzer)
Instantiate, using the provided indexing and lookup analyzers, using bigram model by default.
Declaration
public FreeTextSuggester(Analyzer indexAnalyzer, Analyzer queryAnalyzer)
Parameters
| Type | Name | Description |
|---|---|---|
| Analyzer | indexAnalyzer | |
| Analyzer | queryAnalyzer |
FreeTextSuggester(Analyzer, Analyzer, int)
Instantiate, using the provided indexing and lookup analyzers, with the specified model (2 = bigram, 3 = trigram, etc.).
Declaration
public FreeTextSuggester(Analyzer indexAnalyzer, Analyzer queryAnalyzer, int grams)
Parameters
| Type | Name | Description |
|---|---|---|
| Analyzer | indexAnalyzer | |
| Analyzer | queryAnalyzer | |
| int | grams |
FreeTextSuggester(Analyzer, Analyzer, int, byte)
Instantiate, using the provided indexing and lookup
analyzers, and specified model (2 = bigram, 3 =
trigram ,etc.). The separator is passed to SetTokenSeparator(string)
to join multiple
tokens into a single ngram token; it must be an ascii
(7-bit-clean) byte. No input tokens should have this
byte, otherwise ArgumentException is
thrown.
Declaration
public FreeTextSuggester(Analyzer indexAnalyzer, Analyzer queryAnalyzer, int grams, byte separator)
Parameters
| Type | Name | Description |
|---|---|---|
| Analyzer | indexAnalyzer | |
| Analyzer | queryAnalyzer | |
| int | grams | |
| byte | separator |
Fields
ALPHA
The constant used for backoff smoothing; during lookup, this means that if a given trigram did not occur, and we backoff to the bigram, the overall score will be 0.4 times what the bigram model would have assigned.
Declaration
public const double ALPHA = 0.4
Field Value
| Type | Description |
|---|---|
| double |
CODEC_NAME
Codec name used in the header for the saved model.
Declaration
public const string CODEC_NAME = "freetextsuggest"
Field Value
| Type | Description |
|---|---|
| string |
DEFAULT_GRAMS
By default we use a bigram model.
Declaration
public const int DEFAULT_GRAMS = 2
Field Value
| Type | Description |
|---|---|
| int |
DEFAULT_SEPARATOR
The default character used to join multiple tokens into a single ngram token. The input tokens produced by the analyzer must not contain this character.
Declaration
public const byte DEFAULT_SEPARATOR = 30
Field Value
| Type | Description |
|---|---|
| byte |
VERSION_CURRENT
Current version of the the saved model file format.
Declaration
public const int VERSION_CURRENT = 0
Field Value
| Type | Description |
|---|---|
| int |
VERSION_START
Initial version of the the saved model file format.
Declaration
public const int VERSION_START = 0
Field Value
| Type | Description |
|---|---|
| int |
Properties
Count
Get the number of entries the lookup was built with
Declaration
public override long Count { get; }
Property Value
| Type | Description |
|---|---|
| long | total number of suggester entries |
Overrides
Methods
Build(IInputEnumerator)
Builds up a new internal Lookup representation based on the given IInputEnumerator. The implementation might re-sort the data internally.
Declaration
public override void Build(IInputEnumerator enumerator)
Parameters
| Type | Name | Description |
|---|---|---|
| IInputEnumerator | enumerator |
Overrides
Build(IInputEnumerator, double)
Build the suggest index, using up to the specified amount of temporary RAM while building. Note that the weights for the suggestions are ignored.
Declaration
public virtual void Build(IInputEnumerator enumerator, double ramBufferSizeMB)
Parameters
| Type | Name | Description |
|---|---|---|
| IInputEnumerator | enumerator | |
| double | ramBufferSizeMB |
DoLookup(string, bool, int)
Look up a key and return possible completion for this key.
Declaration
public override IList<Lookup.LookupResult> DoLookup(string key, bool onlyMorePopular, int num)
Parameters
| Type | Name | Description |
|---|---|---|
| string | key | lookup key. Depending on the implementation this may be a prefix, misspelling, or even infix. |
| bool | onlyMorePopular | return only more popular results |
| int | num | maximum number of results to return |
Returns
| Type | Description |
|---|---|
| IList<Lookup.LookupResult> | a list of possible completions, with their relative weight (e.g. popularity) |
Overrides
DoLookup(string, IEnumerable<BytesRef>, bool, int)
Look up a key and return possible completion for this key.
Declaration
public override IList<Lookup.LookupResult> DoLookup(string key, IEnumerable<BytesRef> contexts, bool onlyMorePopular, int num)
Parameters
| Type | Name | Description |
|---|---|---|
| string | key | lookup key. Depending on the implementation this may be a prefix, misspelling, or even infix. |
| IEnumerable<BytesRef> | contexts | contexts to filter the lookup by, or null if all contexts are allowed; if the suggestion contains any of the contexts, it's a match |
| bool | onlyMorePopular | return only more popular results |
| int | num | maximum number of results to return |
Returns
| Type | Description |
|---|---|
| IList<Lookup.LookupResult> | a list of possible completions, with their relative weight (e.g. popularity) |
Overrides
DoLookup(string, IEnumerable<BytesRef>, int)
Retrieve suggestions.
Declaration
public virtual IList<Lookup.LookupResult> DoLookup(string key, IEnumerable<BytesRef> contexts, int num)
Parameters
| Type | Name | Description |
|---|---|---|
| string | key | |
| IEnumerable<BytesRef> | contexts | |
| int | num |
Returns
| Type | Description |
|---|---|
| IList<Lookup.LookupResult> |
DoLookup(string, int)
Lookup, without any context.
Declaration
public virtual IList<Lookup.LookupResult> DoLookup(string key, int num)
Parameters
| Type | Name | Description |
|---|---|---|
| string | key | |
| int | num |
Returns
| Type | Description |
|---|---|
| IList<Lookup.LookupResult> |
Get(string)
Returns the weight associated with an input string, or null if it does not exist.
Declaration
public virtual object Get(string key)
Parameters
| Type | Name | Description |
|---|---|---|
| string | key |
Returns
| Type | Description |
|---|---|
| object |
GetSizeInBytes()
Returns byte size of the underlying FST.
Declaration
public override long GetSizeInBytes()
Returns
| Type | Description |
|---|---|
| long |
Overrides
Load(DataInput)
Discard current lookup data and load it from a previously saved copy. Optional operation.
Declaration
public override bool Load(DataInput input)
Parameters
| Type | Name | Description |
|---|---|---|
| DataInput | input | the Lucene.Net.Store.DataInput to load the lookup data. |
Returns
| Type | Description |
|---|---|
| bool | true if completed successfully, false if unsuccessful or not supported. |
Overrides
Exceptions
| Type | Condition |
|---|---|
| IOException | when fatal IO error occurs. |
Store(DataOutput)
Persist the constructed lookup data to a directory. Optional operation.
Declaration
public override bool Store(DataOutput output)
Parameters
| Type | Name | Description |
|---|---|---|
| DataOutput | output | Lucene.Net.Store.DataOutput to write the data to. |
Returns
| Type | Description |
|---|---|
| bool | true if successful, false if unsuccessful or not supported. |
Overrides
Exceptions
| Type | Condition |
|---|---|
| IOException | when fatal IO error occurs. |