Fork me on GitHub
  • API

    Show / Hide Table of Contents

    Namespace Lucene.Net.Analysis.Hunspell

    Stemming TokenFilter using a Java implementation of the Hunspell stemming algorithm.

    Dictionaries can be found on OpenOffice's wiki

    Classes

    Dictionary

    In-memory structure for the dictionary (.dic) and affix (.aff) data of a hunspell dictionary.

    HunspellStemFilter

    Lucene.Net.Analysis.TokenFilter that uses hunspell affix rules and words to stem tokens.
    Since hunspell supports a word having multiple stems, this filter can emit multiple tokens for each consumed token

    Note: This filter is aware of the KeywordAttribute. To prevent certain terms from being passed to the stemmer IsKeyword should be set to true in a previous Lucene.Net.Analysis.TokenStream. Note: For including the original term as well as the stemmed version, see KeywordRepeatFilterFactory

    This is a Lucene.NET EXPERIMENTAL API, use at your own risk

    HunspellStemFilterFactory

    TokenFilterFactory that creates instances of HunspellStemFilter. Example config for British English:

    <filter class="solr.HunspellStemFilterFactory"
            dictionary="en_GB.dic,my_custom.dic"
            affix="en_GB.aff" 
            ignoreCase="false"
            longestOnly="false" />

    Both parameters dictionary and affix are mandatory. Dictionaries for many languages are available through the OpenOffice project.

    See http://wiki.apache.org/solr/Hunspell

    This is a Lucene.NET EXPERIMENTAL API, use at your own risk
    • Improve this Doc
    Back to top Copyright © 2020 The Apache Software Foundation, Licensed under the Apache License, Version 2.0
    Apache Lucene.Net, Lucene.Net, Apache, the Apache feather logo, and the Apache Lucene.Net project logo are trademarks of The Apache Software Foundation.
    All other marks mentioned may be trademarks or registered trademarks of their respective owners.