• API

    Show / Hide Table of Contents

    Class DirectSpellChecker

    Simple automaton-based spellchecker.

    Candidates are presented directly from the term dictionary, based on Levenshtein distance. This is an alternative to SpellChecker if you are using an edit-distance-like metric such as Levenshtein or JaroWinklerDistance.

    A practical benefit of this spellchecker is that it requires no additional datastructures (neither in RAM nor on disk) to do its work.

    Inheritance
    System.Object
    DirectSpellChecker
    Inherited Members
    System.Object.Equals(System.Object)
    System.Object.Equals(System.Object, System.Object)
    System.Object.GetHashCode()
    System.Object.GetType()
    System.Object.MemberwiseClone()
    System.Object.ReferenceEquals(System.Object, System.Object)
    System.Object.ToString()
    Namespace: Lucene.Net.Search.Spell
    Assembly: Lucene.Net.Suggest.dll
    Syntax
    public class DirectSpellChecker

    Constructors

    | Improve this Doc View Source

    DirectSpellChecker()

    Creates a DirectSpellChecker with default configuration values

    Declaration
    public DirectSpellChecker()

    Fields

    | Improve this Doc View Source

    INTERNAL_LEVENSHTEIN

    The default StringDistance, Damerau-Levenshtein distance implemented internally via LevenshteinAutomata.

    Note: this is the fastest distance metric, because Damerau-Levenshtein is used to draw candidates from the term dictionary: this just re-uses the scoring.

    Declaration
    public static readonly IStringDistance INTERNAL_LEVENSHTEIN
    Field Value
    Type Description
    IStringDistance

    Properties

    | Improve this Doc View Source

    Accuracy

    Gets or sets the minimal accuracy required (default: 0.5f) from a StringDistance for a suggestion match.

    Declaration
    public virtual float Accuracy { get; set; }
    Property Value
    Type Description
    System.Single
    | Improve this Doc View Source

    Comparer

    Gets or sets the comparer for sorting suggestions. The default is DEFAULT_COMPARER

    Declaration
    public virtual IComparer<SuggestWord> Comparer { get; set; }
    Property Value
    Type Description
    System.Collections.Generic.IComparer<SuggestWord>
    | Improve this Doc View Source

    Distance

    Gets or sets the string distance metric. The default is INTERNAL_LEVENSHTEIN.

    Note: because this spellchecker draws its candidates from the term dictionary using Damerau-Levenshtein, it works best with an edit-distance-like string metric. If you use a different metric than the default, you might want to consider increasing MaxInspections to draw more candidates for your metric to rank.

    Declaration
    public virtual IStringDistance Distance { get; set; }
    Property Value
    Type Description
    IStringDistance
    | Improve this Doc View Source

    LowerCaseTerms

    True if the spellchecker should lowercase terms (default: true)

    This is a convenience method, if your index field has more complicated analysis (such as StandardTokenizer removing punctuation), its probably better to turn this off, and instead run your query terms through your Analyzer first.

    If this option is not on, case differences count as an edit!

    Declaration
    public virtual bool LowerCaseTerms { get; set; }
    Property Value
    Type Description
    System.Boolean
    | Improve this Doc View Source

    LowerCaseTermsCulture

    Gets or sets the culture to use for lowercasing terms. Set to null (the default) to use System.Globalization.CultureInfo.CurrentCulture.

    Declaration
    public virtual CultureInfo LowerCaseTermsCulture { get; set; }
    Property Value
    Type Description
    System.Globalization.CultureInfo
    | Improve this Doc View Source

    MaxEdits

    Gets or sets the maximum number of Levenshtein edit-distances to draw candidate terms from.This value can be 1 or 2. The default is 2.

    Note: a large number of spelling errors occur with an edit distance of 1, by setting this value to 1 you can increase both performance and precision at the cost of recall.

    Declaration
    public virtual int MaxEdits { get; set; }
    Property Value
    Type Description
    System.Int32
    | Improve this Doc View Source

    MaxInspections

    Get the maximum number of top-N inspections per suggestion.

    Increasing this number can improve the accuracy of results, at the cost of performance.

    Declaration
    public virtual int MaxInspections { get; set; }
    Property Value
    Type Description
    System.Int32
    | Improve this Doc View Source

    MaxQueryFrequency

    Gets or sets the maximum threshold (default: 0.01f) of documents a query term can appear in order to provide suggestions.

    Very high-frequency terms are typically spelled correctly. Additionally, this can increase performance as it will do no work for the common case of correctly-spelled input terms.

    This can be specified as a relative percentage of documents such as 0.5f, or it can be specified as an absolute whole document frequency, such as 4f. Absolute document frequencies may not be fractional.

    Declaration
    public virtual float MaxQueryFrequency { get; set; }
    Property Value
    Type Description
    System.Single
    | Improve this Doc View Source

    MinPrefix

    Gets or sets the minimal number of characters that must match exactly.

    This can improve both performance and accuracy of results, as misspellings are commonly not the first character.

    Declaration
    public virtual int MinPrefix { get; set; }
    Property Value
    Type Description
    System.Int32
    | Improve this Doc View Source

    MinQueryLength

    Gets or sets the minimum length of a query term (default: 4) needed to return suggestions.

    Very short query terms will often cause only bad suggestions with any distance metric.

    Declaration
    public virtual int MinQueryLength { get; set; }
    Property Value
    Type Description
    System.Int32
    | Improve this Doc View Source

    ThresholdFrequency

    Gets or sets the minimal threshold of documents a term must appear for a match.

    This can improve quality by only suggesting high-frequency terms. Note that very high values might decrease performance slightly, by forcing the spellchecker to draw more candidates from the term dictionary, but a practical value such as 1 can be very useful towards improving quality.

    This can be specified as a relative percentage of documents such as 0.5f, or it can be specified as an absolute whole document frequency, such as 4f. Absolute document frequencies may not be fractional.

    Declaration
    public virtual float ThresholdFrequency { get; set; }
    Property Value
    Type Description
    System.Single

    Methods

    | Improve this Doc View Source

    SuggestSimilar(Term, Int32, IndexReader)

    Calls SuggestSimilar(Term, Int32, IndexReader, SuggestMode) SuggestSimilar(term, numSug, ir, SuggestMode.SUGGEST_WHEN_NOT_IN_INDEX)

    Declaration
    public virtual SuggestWord[] SuggestSimilar(Term term, int numSug, IndexReader ir)
    Parameters
    Type Name Description
    Lucene.Net.Index.Term term
    System.Int32 numSug
    Lucene.Net.Index.IndexReader ir
    Returns
    Type Description
    SuggestWord[]
    | Improve this Doc View Source

    SuggestSimilar(Term, Int32, IndexReader, SuggestMode)

    Calls SuggestSimilar(Term, Int32, IndexReader, SuggestMode, Single) SuggestSimilar(term, numSug, ir, suggestMode, this.accuracy)

    Declaration
    public virtual SuggestWord[] SuggestSimilar(Term term, int numSug, IndexReader ir, SuggestMode suggestMode)
    Parameters
    Type Name Description
    Lucene.Net.Index.Term term
    System.Int32 numSug
    Lucene.Net.Index.IndexReader ir
    SuggestMode suggestMode
    Returns
    Type Description
    SuggestWord[]
    | Improve this Doc View Source

    SuggestSimilar(Term, Int32, IndexReader, SuggestMode, Single)

    Suggest similar words.

    Unlike SpellChecker, the similarity used to fetch the most relevant terms is an edit distance, therefore typically a low value for numSug will work very well.

    Declaration
    public virtual SuggestWord[] SuggestSimilar(Term term, int numSug, IndexReader ir, SuggestMode suggestMode, float accuracy)
    Parameters
    Type Name Description
    Lucene.Net.Index.Term term

    Term you want to spell check on

    System.Int32 numSug

    the maximum number of suggested words

    Lucene.Net.Index.IndexReader ir

    IndexReader to find terms from

    SuggestMode suggestMode

    specifies when to return suggested words

    System.Single accuracy

    return only suggested words that match with this similarity

    Returns
    Type Description
    SuggestWord[]

    sorted list of the suggested words according to the comparer

    Exceptions
    Type Condition
    System.IO.IOException

    If there is a low-level I/O error.

    | Improve this Doc View Source

    SuggestSimilar(Term, Int32, IndexReader, Int32, Int32, Single, CharsRef)

    Provide spelling corrections based on several parameters.

    Declaration
    protected virtual ICollection<DirectSpellChecker.ScoreTerm> SuggestSimilar(Term term, int numSug, IndexReader ir, int docfreq, int editDistance, float accuracy, CharsRef spare)
    Parameters
    Type Name Description
    Lucene.Net.Index.Term term

    The term to suggest spelling corrections for

    System.Int32 numSug

    The maximum number of spelling corrections

    Lucene.Net.Index.IndexReader ir

    The index reader to fetch the candidate spelling corrections from

    System.Int32 docfreq

    The minimum document frequency a potential suggestion need to have in order to be included

    System.Int32 editDistance

    The maximum edit distance candidates are allowed to have

    System.Single accuracy

    The minimum accuracy a suggested spelling correction needs to have in order to be included

    Lucene.Net.Util.CharsRef spare

    a chars scratch

    Returns
    Type Description
    System.Collections.Generic.ICollection<DirectSpellChecker.ScoreTerm>

    a collection of spelling corrections sorted by

    ScoreTerm
    's natural order.

    Exceptions
    Type Condition
    System.IO.IOException

    If I/O related errors occur

    See Also

    LevenshteinAutomata
    FuzzyTermsEnum
    • Improve this Doc
    • View Source
    Back to top Copyright © 2020 Licensed to the Apache Software Foundation (ASF)