Class SpellChecker
Spell Checker class (Main class)
(initially inspired by the David Spencer code).
Example Usage (C#):
SpellChecker spellchecker = new SpellChecker(spellIndexDirectory);
// To index a field of a user index:
spellchecker.IndexDictionary(new LuceneDictionary(my_lucene_reader, a_field));
// To index a file containing words:
spellchecker.IndexDictionary(new PlainTextDictionary(new FileInfo("myfile.txt")));
string[] suggestions = spellchecker.SuggestSimilar("misspelt", 5);
Implements
Inherited Members
Namespace: Lucene.Net.Search.Spell
Assembly: Lucene.Net.Suggest.dll
Syntax
public class SpellChecker : IDisposable
Constructors
SpellChecker(Directory)
Use the given directory as a spell checker index with a LevensteinDistance as the default StringDistance. The directory is created if it doesn't exist yet.
Declaration
public SpellChecker(Directory spellIndex)
Parameters
Type | Name | Description |
---|---|---|
Directory | spellIndex | the spell index directory |
Exceptions
Type | Condition |
---|---|
IOException | if spellchecker can not open the directory |
SpellChecker(Directory, IStringDistance)
Use the given directory as a spell checker index. The directory is created if it doesn't exist yet.
Declaration
public SpellChecker(Directory spellIndex, IStringDistance sd)
Parameters
Type | Name | Description |
---|---|---|
Directory | spellIndex | the spell index directory |
IStringDistance | sd | the StringDistance measurement to use |
Exceptions
Type | Condition |
---|---|
IOException | if Spellchecker can not open the directory |
SpellChecker(Directory, IStringDistance, IComparer<SuggestWord>)
Use the given directory as a spell checker index with the given IStringDistance measure and the given IComparer<T> for sorting the results.
Declaration
public SpellChecker(Directory spellIndex, IStringDistance sd, IComparer<SuggestWord> comparer)
Parameters
Type | Name | Description |
---|---|---|
Directory | spellIndex | The spelling index |
IStringDistance | sd | The distance |
IComparer<SuggestWord> | comparer | The comparer |
Exceptions
Type | Condition |
---|---|
IOException | if there is a problem opening the index |
Fields
DEFAULT_ACCURACY
The default minimum score to use, if not specified by setting Accuracy or overriding with SuggestSimilar(string, int, IndexReader, string, SuggestMode, float) .
Declaration
public const float DEFAULT_ACCURACY = 0.5
Field Value
Type | Description |
---|---|
float |
F_WORD
Field name for each word in the ngram index.
Declaration
public const string F_WORD = "word"
Field Value
Type | Description |
---|---|
string |
Properties
Accuracy
Gets or sets the accuracy (minimum score) to be used, unless overridden in SuggestSimilar(string, int, IndexReader, string, SuggestMode, float), to decide whether a suggestion is included or not. Sets the accuracy 0 < minScore < 1; default DEFAULT_ACCURACY
Declaration
public virtual float Accuracy { get; set; }
Property Value
Type | Description |
---|---|
float |
Comparer
Gets or sets the IComparer<T> for the SuggestWordQueue.
Declaration
public virtual IComparer<SuggestWord> Comparer { get; set; }
Property Value
Type | Description |
---|---|
IComparer<SuggestWord> |
StringDistance
Gets or sets the IStringDistance implementation for this SpellChecker instance.
Declaration
public virtual IStringDistance StringDistance { get; set; }
Property Value
Type | Description |
---|---|
IStringDistance |
Methods
ClearIndex()
Removes all terms from the spell check index.
Declaration
public virtual void ClearIndex()
Exceptions
Type | Condition |
---|---|
IOException | If there is a low-level I/O error. |
ObjectDisposedException | if the Spellchecker is already closed |
Dispose()
Dispose the underlying Lucene.Net.Search.IndexSearcher used by this SpellChecker.
Declaration
public void Dispose()
Exceptions
Type | Condition |
---|---|
IOException | if the close operation causes an IOException |
ObjectDisposedException | if the SpellChecker is already disposed |
Dispose(bool)
Releases resources used by the SpellChecker and if overridden in a derived class, optionally releases unmanaged resources.
Declaration
protected virtual void Dispose(bool disposing)
Parameters
Type | Name | Description |
---|---|---|
bool | disposing |
|
Exist(string)
Check whether the word exists in the index.
Declaration
public virtual bool Exist(string word)
Parameters
Type | Name | Description |
---|---|---|
string | word | word to check |
Returns
Type | Description |
---|---|
bool | true if the word exists in the index |
Exceptions
Type | Condition |
---|---|
IOException | If there is a low-level I/O error. |
ObjectDisposedException | if the SpellChecker is already disposed |
IndexDictionary(IDictionary, IndexWriterConfig, bool)
Indexes the data from the given IDictionary.
Declaration
public void IndexDictionary(IDictionary dict, IndexWriterConfig config, bool fullMerge)
Parameters
Type | Name | Description |
---|---|---|
IDictionary | dict | Dictionary to index |
IndexWriterConfig | config | Lucene.Net.Index.IndexWriterConfig to use |
bool | fullMerge | whether or not the spellcheck index should be fully merged |
Exceptions
Type | Condition |
---|---|
ObjectDisposedException | if the SpellChecker is already disposed |
IOException | If there is a low-level I/O error. |
SetSpellIndex(Directory)
Sets a different index as the spell checker index or re-open the existing index if
spellIndex
is the same value
as given in the constructor.
Declaration
public void SetSpellIndex(Directory spellIndexDir)
Parameters
Type | Name | Description |
---|---|---|
Directory | spellIndexDir | the spell directory to use |
Exceptions
Type | Condition |
---|---|
ObjectDisposedException | if the Spellchecker is already closed |
IOException | if spellchecker can not open the directory |
SuggestSimilar(string, int)
Suggest similar words.
As the Lucene similarity that is used to fetch the most relevant n-grammed terms is not the same as the edit distance strategy used to calculate the best matching spell-checked word from the hits that Lucene found, one usually has to retrieve a couple of numSug's in order to get the true best match.
I.e. if numSug == 1, don't count on that suggestion being the best one. Thus, you should set this value to at least 5 for a good suggestion.
Declaration
public virtual string[] SuggestSimilar(string word, int numSug)
Parameters
Type | Name | Description |
---|---|---|
string | word | the word you want a spell check done on |
int | numSug | the number of suggested words |
Returns
Type | Description |
---|---|
string[] | string[] the sorted list of the suggest words with these 2 criteria: first criteria: the edit distance, second criteria (only if restricted mode): the popularity of the suggest words in the field of the user index |
Exceptions
Type | Condition |
---|---|
IOException | if the underlying index throws an IOException |
ObjectDisposedException | if the Spellchecker is already disposed |
See Also
SuggestSimilar(string, int, IndexReader, string, SuggestMode)
Calls SuggestSimilar(string, int, IndexReader, string, SuggestMode, float) SuggestSimilar(word, numSug, ir, suggestMode, field, this.accuracy)
Declaration
public virtual string[] SuggestSimilar(string word, int numSug, IndexReader ir, string field, SuggestMode suggestMode)
Parameters
Type | Name | Description |
---|---|---|
string | word | |
int | numSug | |
IndexReader | ir | |
string | field | |
SuggestMode | suggestMode |
Returns
Type | Description |
---|---|
string[] |
SuggestSimilar(string, int, IndexReader, string, SuggestMode, float)
Suggest similar words (optionally restricted to a field of an index).
As the Lucene similarity that is used to fetch the most relevant n-grammed terms is not the same as the edit distance strategy used to calculate the best matching spell-checked word from the hits that Lucene found, one usually has to retrieve a couple of numSug's in order to get the true best match.
I.e. if numSug == 1, don't count on that suggestion being the best one. Thus, you should set this value to at least 5 for a good suggestion.
Declaration
public virtual string[] SuggestSimilar(string word, int numSug, IndexReader ir, string field, SuggestMode suggestMode, float accuracy)
Parameters
Type | Name | Description |
---|---|---|
string | word | the word you want a spell check done on |
int | numSug | the number of suggested words |
IndexReader | ir | the indexReader of the user index (can be null see field param) |
string | field | the field of the user index: if field is not null, the suggested words are restricted to the words present in this field. |
SuggestMode | suggestMode | (NOTE: if indexReader==null and/or field==null, then this is overridden with SuggestMode.SUGGEST_ALWAYS) |
float | accuracy | The minimum score a suggestion must have in order to qualify for inclusion in the results |
Returns
Type | Description |
---|---|
string[] | string[] the sorted list of the suggest words with these 2 criteria: first criteria: the edit distance, second criteria (only if restricted mode): the popularity of the suggest words in the field of the user index |
Exceptions
Type | Condition |
---|---|
IOException | if the underlying index throws an IOException |
ObjectDisposedException | if the SpellChecker is already disposed |
SuggestSimilar(string, int, float)
Suggest similar words.
As the Lucene similarity that is used to fetch the most relevant n-grammed terms is not the same as the edit distance strategy used to calculate the best matching spell-checked word from the hits that Lucene found, one usually has to retrieve a couple of numSug's in order to get the true best match.
I.e. if numSug == 1, don't count on that suggestion being the best one. Thus, you should set this value to at least 5 for a good suggestion.
Declaration
public virtual string[] SuggestSimilar(string word, int numSug, float accuracy)
Parameters
Type | Name | Description |
---|---|---|
string | word | the word you want a spell check done on |
int | numSug | the number of suggested words |
float | accuracy | The minimum score a suggestion must have in order to qualify for inclusion in the results |
Returns
Type | Description |
---|---|
string[] | string[] the sorted list of the suggest words with these 2 criteria: first criteria: the edit distance, second criteria (only if restricted mode): the popularity of the suggest words in the field of the user index |
Exceptions
Type | Condition |
---|---|
IOException | if the underlying index throws an IOException |
ObjectDisposedException | if the Spellchecker is already disposed |