Show / Hide Table of Contents

    Namespace Lucene.Net.Search.Spell

    Suggest alternate spellings for words. Also see the spell checker Wiki page.

    Classes

    CombineSuggestion

    A suggestion generated by combining one or more original query terms

    DirectSpellChecker

    Simple automaton-based spellchecker.

    Candidates are presented directly from the term dictionary, based on Levenshtein distance. This is an alternative to SpellChecker if you are using an edit-distance-like metric such as Levenshtein or JaroWinklerDistance.

    A practical benefit of this spellchecker is that it requires no additional datastructures (neither in RAM nor on disk) to do its work.

    DirectSpellChecker.ScoreTerm

    Holds a spelling correction for internal usage inside DirectSpellChecker.

    HighFrequencyDictionary

    HighFrequencyDictionary: terms taken from the given field of a Lucene index, which appear in a number of documents above a given threshold.

    Threshold is a value in [0..1] representing the minimum number of documents (of the total) where a term should appear.

    Based on LuceneDictionary.

    JaroWinklerDistance

    Similarity measure for short strings such as person names. See http://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance

    LevensteinDistance

    Levenstein edit distance class.

    LuceneDictionary

    Lucene Dictionary: terms taken from the given field of a Lucene index.

    LuceneLevenshteinDistance

    Damerau-Levenshtein (optimal string alignment) implemented in a consistent way as Lucene's FuzzyTermsEnum with the transpositions option enabled.

    Notes:

    • This metric treats full unicode codepoints as characters
    • This metric scales raw edit distances into a floating point score based upon the shortest of the two terms
    • Transpositions of two adjacent codepoints are treated as primitive edits.
    • Edits are applied in parallel: for example, "ab" and "bca" have distance 3.

    NOTE: this class is not particularly efficient. It is only intended for merging results from multiple DirectSpellCheckers.

    NGramDistance

    N-Gram version of edit distance based on paper by Grzegorz Kondrak, "N-gram similarity and distance". Proceedings of the Twelfth International Conference on String Processing and Information Retrieval (SPIRE 2005), pp. 115-126, Buenos Aires, Argentina, November 2005. http://www.cs.ualberta.ca/~kondrak/papers/spire05.pdf

    This implementation uses the position-based optimization to compute partial matches of n-gram sub-strings and adds a null-character prefix of size n-1 so that the first character is contained in the same number of n-grams as a middle character. Null-character prefix matches are discounted so that strings with no matching characters will return a distance of 0.

    PlainTextDictionary

    Dictionary represented by a text file.

    Format allowed: 1 word per line:

    word1

    word2

    word3

    SpellChecker

    Spell Checker class (Main class)
    (initially inspired by the David Spencer code).

    Example Usage (C#):

     SpellChecker spellchecker = new SpellChecker(spellIndexDirectory);
     // To index a field of a user index:
     spellchecker.IndexDictionary(new LuceneDictionary(my_lucene_reader, a_field));
     // To index a file containing words:
     spellchecker.IndexDictionary(new PlainTextDictionary(new FileInfo("myfile.txt")));
     string[] suggestions = spellchecker.SuggestSimilar("misspelt", 5);

    SuggestWord

    SuggestWord, used in suggestSimilar method in SpellChecker class.

    Default sort is first by score, then by frequency.

    SuggestWordFrequencyComparer

    Frequency first, then score.

    SuggestWordQueue

    Sorts SuggestWord instances

    SuggestWordScoreComparer

    Score first, then frequency

    TermFreqIteratorWrapper

    Wraps a BytesRefIterator as a TermFreqIterator, with all weights set to

    1

    WordBreakSpellChecker

    A spell checker whose sole function is to offer suggestions by combining multiple terms into one word and/or breaking terms into multiple words.

    Interfaces

    IDictionary

    A simple interface representing a Dictionary. A Dictionary here is a list of entries, where every entry consists of term, weight and payload.

    IStringDistance

    Interface for string distances.

    ITermFreqIterator

    Interface for enumerating term,weight pairs.

    Enums

    SuggestMode

    Set of strategies for suggesting related terms

    This is a Lucene.NET EXPERIMENTAL API, use at your own risk

    WordBreakSpellChecker.BreakSuggestionSortMethod

    Determines the order to list word break suggestions

    • Improve this Doc
    Back to top Copyright © 2020 Licensed to the Apache Software Foundation (ASF)