• API

    Show / Hide Table of Contents

    Namespace Lucene.Net.Analysis.Cz

    Analyzer for Czech.

    Classes

    CzechAnalyzer

    Lucene.Net.Analysis.Analyzer for Czech language.

    Supports an external list of stopwords (words that will not be indexed at all). A default set of stopwords is used unless an alternative list is specified.

    You must specify the required Lucene.Net.Util.LuceneVersion compatibility when creating CzechAnalyzer:

    • As of 3.1, words are stemmed with CzechStemFilter
    • As of 2.9, StopFilter preserves position increments
    • As of 2.4, Tokens incorrectly identified as acronyms are corrected (see LUCENE-1068)

    CzechStemFilter

    A Lucene.Net.Analysis.TokenFilter that applies CzechStemmer to stem Czech words.

    To prevent terms from being stemmed use an instance of SetKeywordMarkerFilter or a custom Lucene.Net.Analysis.TokenFilter that sets the KeywordAttribute before this Lucene.Net.Analysis.TokenStream.

    NOTE: Input is expected to be in lowercase, but with diacritical marks

    CzechStemFilterFactory

    Factory for CzechStemFilter.

    <fieldType name="text_czstem" class="solr.TextField" positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.CzechStemFilterFactory"/>
      </analyzer>
    </fieldType>

    CzechStemmer

    Light Stemmer for Czech.

    Implements the algorithm described in:
    Indexing and stemming approaches for the Czech language http://portal.acm.org/citation.cfm?id=1598600

    • Improve this Doc
    Back to top Copyright © 2020 Licensed to the Apache Software Foundation (ASF)