Namespace Lucene.Net.Analysis.Hunspell

Stemming TokenFilter using a Java implementation of the Hunspell stemming algorithm.

Dictionaries can be found on OpenOffice's wiki

Classes

Dictionary

In-memory structure for the dictionary (.dic) and affix (.aff) data of a hunspell dictionary.

HunspellStemFilter

TokenFilter that uses hunspell affix rules and words to stem tokens.
Since hunspell supports a word having multiple stems, this filter can emit multiple tokens for each consumed token

Note: This filter is aware of the KeywordAttribute. To prevent certain terms from being passed to the stemmer IsKeyword should be set to true in a previous TokenStream. Note: For including the original term as well as the stemmed version, see KeywordRepeatFilterFactory

This is a Lucene.NET EXPERIMENTAL API, use at your own risk

HunspellStemFilterFactory

TokenFilterFactory that creates instances of HunspellStemFilter. Example config for British English:

<filter class="solr.HunspellStemFilterFactory"
        dictionary="en_GB.dic,my_custom.dic"
        affix="en_GB.aff" 
        ignoreCase="false"
        longestOnly="false" />

Both parameters dictionary and affix are mandatory. Dictionaries for many languages are available through the OpenOffice project.

See http://wiki.apache.org/solr/Hunspell

This is a Lucene.NET EXPERIMENTAL API, use at your own risk