Show / Hide Table of Contents

    Namespace Lucene.Net.Analysis.Hunspell

    Stemming TokenFilter using a Java implementation of the Hunspell stemming algorithm.

    Dictionaries can be found on OpenOffice's wiki

    Classes

    Dictionary

    In-memory structure for the dictionary (.dic) and affix (.aff) data of a hunspell dictionary.

    HunspellStemFilter

    TokenFilter that uses hunspell affix rules and words to stem tokens.
    Since hunspell supports a word having multiple stems, this filter can emit multiple tokens for each consumed token

    Note: This filter is aware of the KeywordAttribute. To prevent certain terms from being passed to the stemmer IsKeyword should be set to true in a previous TokenStream. Note: For including the original term as well as the stemmed version, see KeywordRepeatFilterFactory

    This is a Lucene.NET EXPERIMENTAL API, use at your own risk

    HunspellStemFilterFactory

    TokenFilterFactory that creates instances of HunspellStemFilter. Example config for British English:

    <filter class="solr.HunspellStemFilterFactory"
            dictionary="en_GB.dic,my_custom.dic"
            affix="en_GB.aff" 
            ignoreCase="false"
            longestOnly="false" />

    Both parameters dictionary and affix are mandatory. Dictionaries for many languages are available through the OpenOffice project.

    See http://wiki.apache.org/solr/Hunspell

    This is a Lucene.NET EXPERIMENTAL API, use at your own risk
    • Improve this Doc
    Back to top Copyright © 2020 Licensed to the Apache Software Foundation (ASF)