Namespace Lucene.Net.Search.Spell
Suggest alternate spellings for words. Also see the spell checker Wiki page.
Classes
CombineSuggestion
A suggestion generated by combining one or more original query terms
DirectSpellChecker
Simple automaton-based spellchecker.
Candidates are presented directly from the term dictionary, based on Levenshtein distance. This is an alternative to SpellChecker if you are using an edit-distance-like metric such as Levenshtein or JaroWinklerDistance.
A practical benefit of this spellchecker is that it requires no additional datastructures (neither in RAM nor on disk) to do its work.
DirectSpellChecker.ScoreTerm
Holds a spelling correction for internal usage inside DirectSpellChecker.
HighFrequencyDictionary
HighFrequencyDictionary: terms taken from the given field of a Lucene index, which appear in a number of documents above a given threshold.
Threshold is a value in [0..1] representing the minimum number of documents (of the total) where a term should appear.
Based on LuceneDictionary.
JaroWinklerDistance
Similarity measure for short strings such as person names. See http://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance
LevensteinDistance
Levenstein edit distance class.
LuceneDictionary
Lucene Dictionary: terms taken from the given field of a Lucene index.
LuceneLevenshteinDistance
Damerau-Levenshtein (optimal string alignment) implemented in a consistent way as Lucene's FuzzyTermsEnum with the transpositions option enabled.
Notes:
- This metric treats full unicode codepoints as characters
- This metric scales raw edit distances into a floating point score based upon the shortest of the two terms
- Transpositions of two adjacent codepoints are treated as primitive edits.
- Edits are applied in parallel: for example, "ab" and "bca" have distance 3.
NOTE: this class is not particularly efficient. It is only intended for merging results from multiple DirectSpellCheckers.
NGramDistance
N-Gram version of edit distance based on paper by Grzegorz Kondrak, "N-gram similarity and distance". Proceedings of the Twelfth International Conference on String Processing and Information Retrieval (SPIRE 2005), pp. 115-126, Buenos Aires, Argentina, November 2005. http://www.cs.ualberta.ca/~kondrak/papers/spire05.pdf
This implementation uses the position-based optimization to compute partial matches of n-gram sub-strings and adds a null-character prefix of size n-1 so that the first character is contained in the same number of n-grams as a middle character. Null-character prefix matches are discounted so that strings with no matching characters will return a distance of 0.
PlainTextDictionary
Dictionary represented by a text file.
Format allowed: 1 word per line: word1 word2 word3SpellChecker
Spell Checker class (Main class)
(initially inspired by the David Spencer code).
Example Usage (C#):
SpellChecker spellchecker = new SpellChecker(spellIndexDirectory);
// To index a field of a user index:
spellchecker.IndexDictionary(new LuceneDictionary(my_lucene_reader, a_field));
// To index a file containing words:
spellchecker.IndexDictionary(new PlainTextDictionary(new FileInfo("myfile.txt")));
string[] suggestions = spellchecker.SuggestSimilar("misspelt", 5);
SuggestWord
SuggestWord, used in suggestSimilar method in SpellChecker class.
Default sort is first by score, then by frequency.SuggestWordFrequencyComparer
Frequency first, then score.
SuggestWordQueue
Sorts SuggestWord instances
SuggestWordScoreComparer
Score first, then frequency
TermFreqEnumeratorWrapper
Wraps a Lucene.Net.Util.BytesRefEnumerator as a ITermFreqEnumerator, with all weights
set to 1
.
WordBreakSpellChecker
A spell checker whose sole function is to offer suggestions by combining multiple terms into one word and/or breaking terms into multiple words.
Interfaces
IDictionary
A simple interface representing a Dictionary. A Dictionary here is a list of entries, where every entry consists of term, weight and payload.
IStringDistance
Interface for string distances.
ITermFreqEnumerator
Interface for enumerating term,weight pairs.
Enums
SuggestMode
Set of strategies for suggesting related terms
Note
This API is experimental and might change in incompatible ways in the next release.
WordBreakSpellChecker.BreakSuggestionSortMethod
Determines the order to list word break suggestions