Fork me on GitHub
  • API

    Show / Hide Table of Contents

    Namespace Lucene.Net.Analysis.Cz

    Analyzer for Czech.

    Classes

    CzechAnalyzer

    Lucene.Net.Analysis.Analyzer for Czech language.

    Supports an external list of stopwords (words that will not be indexed at all). A default set of stopwords is used unless an alternative list is specified.

    You must specify the required Lucene.Net.Util.LuceneVersion compatibility when creating CzechAnalyzer:

    • As of 3.1, words are stemmed with CzechStemFilter
    • As of 2.9, StopFilter preserves position increments
    • As of 2.4, Tokens incorrectly identified as acronyms are corrected (see LUCENE-1068)

    CzechStemFilter

    A Lucene.Net.Analysis.TokenFilter that applies CzechStemmer to stem Czech words.

    To prevent terms from being stemmed use an instance of SetKeywordMarkerFilter or a custom Lucene.Net.Analysis.TokenFilter that sets the KeywordAttribute before this Lucene.Net.Analysis.TokenStream.

    NOTE: Input is expected to be in lowercase, but with diacritical marks

    CzechStemFilterFactory

    Factory for CzechStemFilter.

    <fieldType name="text_czstem" class="solr.TextField" positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.CzechStemFilterFactory"/>
      </analyzer>
    </fieldType>

    CzechStemmer

    Light Stemmer for Czech.

    Implements the algorithm described in:
    Indexing and stemming approaches for the Czech language http://portal.acm.org/citation.cfm?id=1598600

    • Improve this Doc
    Back to top Copyright © 2020 The Apache Software Foundation, Licensed under the Apache License, Version 2.0
    Apache Lucene.Net, Lucene.Net, Apache, the Apache feather logo, and the Apache Lucene.Net project logo are trademarks of The Apache Software Foundation.
    All other marks mentioned may be trademarks or registered trademarks of their respective owners.