Fork me on GitHub
  • API

    Show / Hide Table of Contents

    Class FreeTextSuggester

    Builds an ngram model from the text sent to Build(IInputEnumerator, double) and predicts based on the last grams-1 tokens in the request sent to DoLookup(string, IEnumerable<BytesRef>, bool, int). This tries to handle the "long tail" of suggestions for when the incoming query is a never before seen query string.

    Likely this suggester would only be used as a fallback, when the primary suggester fails to find any suggestions.

    Note that the weight for each suggestion is unused, and the suggestions are the analyzed forms (so your analysis process should normally be very "light").

    This uses the stupid backoff language model to smooth scores across ngram models; see "Large language models in machine translation" for details.

    From DoLookup(string, IEnumerable<BytesRef>, bool, int), the key of each result is the ngram token; the value is MaxValue * score (fixed point, cast to long). Divide by MaxValue to get the score back, which ranges from 0.0 to 1.0.

    onlyMorePopular is unused.

    Note

    This API is experimental and might change in incompatible ways in the next release.

    Inheritance
    object
    Lookup
    FreeTextSuggester
    Inherited Members
    Lookup.CHARSEQUENCE_COMPARER
    Lookup.Build(IDictionary)
    Lookup.Load(Stream)
    Lookup.Store(Stream)
    object.Equals(object)
    object.Equals(object, object)
    object.GetHashCode()
    object.GetType()
    object.MemberwiseClone()
    object.ReferenceEquals(object, object)
    object.ToString()
    Namespace: Lucene.Net.Search.Suggest.Analyzing
    Assembly: Lucene.Net.Suggest.dll
    Syntax
    public class FreeTextSuggester : Lookup

    Constructors

    FreeTextSuggester(Analyzer)

    Instantiate, using the provided analyzer for both indexing and lookup, using bigram model by default.

    Declaration
    public FreeTextSuggester(Analyzer analyzer)
    Parameters
    Type Name Description
    Analyzer analyzer

    FreeTextSuggester(Analyzer, Analyzer)

    Instantiate, using the provided indexing and lookup analyzers, using bigram model by default.

    Declaration
    public FreeTextSuggester(Analyzer indexAnalyzer, Analyzer queryAnalyzer)
    Parameters
    Type Name Description
    Analyzer indexAnalyzer
    Analyzer queryAnalyzer

    FreeTextSuggester(Analyzer, Analyzer, int)

    Instantiate, using the provided indexing and lookup analyzers, with the specified model (2 = bigram, 3 = trigram, etc.).

    Declaration
    public FreeTextSuggester(Analyzer indexAnalyzer, Analyzer queryAnalyzer, int grams)
    Parameters
    Type Name Description
    Analyzer indexAnalyzer
    Analyzer queryAnalyzer
    int grams

    FreeTextSuggester(Analyzer, Analyzer, int, byte)

    Instantiate, using the provided indexing and lookup analyzers, and specified model (2 = bigram, 3 = trigram ,etc.). The separator is passed to SetTokenSeparator(string) to join multiple tokens into a single ngram token; it must be an ascii (7-bit-clean) byte. No input tokens should have this byte, otherwise ArgumentException is thrown.

    Declaration
    public FreeTextSuggester(Analyzer indexAnalyzer, Analyzer queryAnalyzer, int grams, byte separator)
    Parameters
    Type Name Description
    Analyzer indexAnalyzer
    Analyzer queryAnalyzer
    int grams
    byte separator

    Fields

    ALPHA

    The constant used for backoff smoothing; during lookup, this means that if a given trigram did not occur, and we backoff to the bigram, the overall score will be 0.4 times what the bigram model would have assigned.

    Declaration
    public const double ALPHA = 0.4
    Field Value
    Type Description
    double

    CODEC_NAME

    Codec name used in the header for the saved model.

    Declaration
    public const string CODEC_NAME = "freetextsuggest"
    Field Value
    Type Description
    string

    DEFAULT_GRAMS

    By default we use a bigram model.

    Declaration
    public const int DEFAULT_GRAMS = 2
    Field Value
    Type Description
    int

    DEFAULT_SEPARATOR

    The default character used to join multiple tokens into a single ngram token. The input tokens produced by the analyzer must not contain this character.

    Declaration
    public const byte DEFAULT_SEPARATOR = 30
    Field Value
    Type Description
    byte

    VERSION_CURRENT

    Current version of the the saved model file format.

    Declaration
    public const int VERSION_CURRENT = 0
    Field Value
    Type Description
    int

    VERSION_START

    Initial version of the the saved model file format.

    Declaration
    public const int VERSION_START = 0
    Field Value
    Type Description
    int

    Properties

    Count

    Get the number of entries the lookup was built with

    Declaration
    public override long Count { get; }
    Property Value
    Type Description
    long

    total number of suggester entries

    Overrides
    Lookup.Count

    Methods

    Build(IInputEnumerator)

    Builds up a new internal Lookup representation based on the given IInputEnumerator. The implementation might re-sort the data internally.

    Declaration
    public override void Build(IInputEnumerator enumerator)
    Parameters
    Type Name Description
    IInputEnumerator enumerator
    Overrides
    Lookup.Build(IInputEnumerator)

    Build(IInputEnumerator, double)

    Build the suggest index, using up to the specified amount of temporary RAM while building. Note that the weights for the suggestions are ignored.

    Declaration
    public virtual void Build(IInputEnumerator enumerator, double ramBufferSizeMB)
    Parameters
    Type Name Description
    IInputEnumerator enumerator
    double ramBufferSizeMB

    DoLookup(string, bool, int)

    Look up a key and return possible completion for this key.

    Declaration
    public override IList<Lookup.LookupResult> DoLookup(string key, bool onlyMorePopular, int num)
    Parameters
    Type Name Description
    string key

    lookup key. Depending on the implementation this may be a prefix, misspelling, or even infix.

    bool onlyMorePopular

    return only more popular results

    int num

    maximum number of results to return

    Returns
    Type Description
    IList<Lookup.LookupResult>

    a list of possible completions, with their relative weight (e.g. popularity)

    Overrides
    Lookup.DoLookup(string, bool, int)

    DoLookup(string, IEnumerable<BytesRef>, bool, int)

    Look up a key and return possible completion for this key.

    Declaration
    public override IList<Lookup.LookupResult> DoLookup(string key, IEnumerable<BytesRef> contexts, bool onlyMorePopular, int num)
    Parameters
    Type Name Description
    string key

    lookup key. Depending on the implementation this may be a prefix, misspelling, or even infix.

    IEnumerable<BytesRef> contexts

    contexts to filter the lookup by, or null if all contexts are allowed; if the suggestion contains any of the contexts, it's a match

    bool onlyMorePopular

    return only more popular results

    int num

    maximum number of results to return

    Returns
    Type Description
    IList<Lookup.LookupResult>

    a list of possible completions, with their relative weight (e.g. popularity)

    Overrides
    Lookup.DoLookup(string, IEnumerable<BytesRef>, bool, int)

    DoLookup(string, IEnumerable<BytesRef>, int)

    Retrieve suggestions.

    Declaration
    public virtual IList<Lookup.LookupResult> DoLookup(string key, IEnumerable<BytesRef> contexts, int num)
    Parameters
    Type Name Description
    string key
    IEnumerable<BytesRef> contexts
    int num
    Returns
    Type Description
    IList<Lookup.LookupResult>

    DoLookup(string, int)

    Lookup, without any context.

    Declaration
    public virtual IList<Lookup.LookupResult> DoLookup(string key, int num)
    Parameters
    Type Name Description
    string key
    int num
    Returns
    Type Description
    IList<Lookup.LookupResult>

    Get(string)

    Returns the weight associated with an input string, or null if it does not exist.

    Declaration
    public virtual object Get(string key)
    Parameters
    Type Name Description
    string key
    Returns
    Type Description
    object

    GetSizeInBytes()

    Returns byte size of the underlying FST.

    Declaration
    public override long GetSizeInBytes()
    Returns
    Type Description
    long
    Overrides
    Lookup.GetSizeInBytes()

    Load(DataInput)

    Discard current lookup data and load it from a previously saved copy. Optional operation.

    Declaration
    public override bool Load(DataInput input)
    Parameters
    Type Name Description
    DataInput input

    the Lucene.Net.Store.DataInput to load the lookup data.

    Returns
    Type Description
    bool

    true if completed successfully, false if unsuccessful or not supported.

    Overrides
    Lookup.Load(DataInput)
    Exceptions
    Type Condition
    IOException

    when fatal IO error occurs.

    Store(DataOutput)

    Persist the constructed lookup data to a directory. Optional operation.

    Declaration
    public override bool Store(DataOutput output)
    Parameters
    Type Name Description
    DataOutput output

    Lucene.Net.Store.DataOutput to write the data to.

    Returns
    Type Description
    bool

    true if successful, false if unsuccessful or not supported.

    Overrides
    Lookup.Store(DataOutput)
    Exceptions
    Type Condition
    IOException

    when fatal IO error occurs.

    Back to top Copyright © 2024 The Apache Software Foundation, Licensed under the Apache License, Version 2.0
    Apache Lucene.Net, Lucene.Net, Apache, the Apache feather logo, and the Apache Lucene.Net project logo are trademarks of The Apache Software Foundation.
    All other marks mentioned may be trademarks or registered trademarks of their respective owners.