Namespace Lucene.Net.Analysis.Hunspell

Stemming TokenFilter using a Java implementation of the Hunspell stemming algorithm.

Dictionaries can be found on OpenOffice's wiki

Classes

Dictionary

In-memory structure for the dictionary (.dic) and affix (.aff) data of a hunspell dictionary.

Lucene.Net.Analysis.TokenFilter that uses hunspell affix rules and words to stem tokens.
Since hunspell supports a word having multiple stems, this filter can emit multiple tokens for each consumed token

Note: This filter is aware of the Lucene.Net.Analysis.TokenAttributes.KeywordAttribute. To prevent certain terms from being passed to the stemmer IsKeyword should be set to true in a previous Lucene.Net.Analysis.TokenStream. Note: For including the original term as well as the stemmed version, see KeywordRepeatFilterFactory

Note

This API is experimental and might change in incompatible ways in the next release.

HunspellStemFilterFactory

TokenFilterFactory that creates instances of HunspellStemFilter. Example config for British English:

<filter class="solr.HunspellStemFilterFactory"
        dictionary="en_GB.dic,my_custom.dic"
        affix="en_GB.aff" 
        ignoreCase="false"
        longestOnly="false" />

Both parameters dictionary and affix are mandatory. Dictionaries for many languages are available through the OpenOffice project.

See http://wiki.apache.org/solr/Hunspell

Note

This API is experimental and might change in incompatible ways in the next release.