Namespace Lucene.Net.Analysis.Hunspell
Stemming TokenFilter using a Java implementation of the Hunspell stemming algorithm.
Dictionaries can be found on OpenOffice's wiki
Classes
Dictionary
In-memory structure for the dictionary (.dic) and affix (.aff) data of a hunspell dictionary.
HunspellStemFilter
TokenFilter that uses hunspell affix rules and words to stem tokens.
Since hunspell supports a word having multiple stems, this filter can emit
multiple tokens for each consumed token
Note: This filter is aware of the KeywordAttribute. To prevent
certain terms from being passed to the stemmer
IsKeyword should be set to true
in a previous TokenStream.
Note: For including the original term as well as the stemmed version, see
KeywordRepeatFilterFactory
HunspellStemFilterFactory
TokenFilterFactory that creates instances of HunspellStemFilter. Example config for British English:
<filter class="solr.HunspellStemFilterFactory"
dictionary="en_GB.dic,my_custom.dic"
affix="en_GB.aff"
ignoreCase="false"
longestOnly="false" />
Both parameters dictionary and affix are mandatory. Dictionaries for many languages are available through the OpenOffice project.
See http://wiki.apache.org/solr/Hunspell