Namespace Lucene.Net.Analysis.Hunspell
Stemming TokenFilter using a Java implementation of the Hunspell stemming algorithm.
Dictionaries can be found on OpenOffice's wiki
Classes
Dictionary
In-memory structure for the dictionary (.dic) and affix (.aff) data of a hunspell dictionary.
HunspellStemFilter
Lucene.Net.Analysis.TokenFilter that uses hunspell affix rules and words to stem tokens.
Since hunspell supports a word having multiple stems, this filter can emit
multiple tokens for each consumed token
Note: This filter is aware of the Lucene.Net.Analysis.TokenAttributes.KeywordAttribute. To prevent
certain terms from being passed to the stemmer
IsKeyword should be set to true
in a previous Lucene.Net.Analysis.TokenStream.
Note: For including the original term as well as the stemmed version, see
KeywordRepeatFilterFactory
Note
This API is experimental and might change in incompatible ways in the next release.
HunspellStemFilterFactory
TokenFilterFactory that creates instances of HunspellStemFilter. Example config for British English:
<filter class="solr.HunspellStemFilterFactory"
dictionary="en_GB.dic,my_custom.dic"
affix="en_GB.aff"
ignoreCase="false"
longestOnly="false" />
Both parameters dictionary and affix are mandatory. Dictionaries for many languages are available through the OpenOffice project.
See http://wiki.apache.org/solr/Hunspell
Note
This API is experimental and might change in incompatible ways in the next release.