Namespace Lucene.Net.Analysis.Cz

Analyzer for Czech.

Classes

CzechAnalyzer

Lucene.Net.Analysis.Analyzer for Czech language.

Supports an external list of stopwords (words that will not be indexed at all). A default set of stopwords is used unless an alternative list is specified.

You must specify the required Lucene.Net.Util.LuceneVersion compatibility when creating CzechAnalyzer:

As of 3.1, words are stemmed with CzechStemFilter
As of 2.9, StopFilter preserves position increments
As of 2.4, Tokens incorrectly identified as acronyms are corrected (see LUCENE-1068)

CzechStemFilter

A Lucene.Net.Analysis.TokenFilter that applies CzechStemmer to stem Czech words.

To prevent terms from being stemmed use an instance of SetKeywordMarkerFilter or a custom Lucene.Net.Analysis.TokenFilter that sets the Lucene.Net.Analysis.TokenAttributes.IKeywordAttribute before this Lucene.Net.Analysis.TokenStream.

NOTE: Input is expected to be in lowercase, but with diacritical marks

CzechStemFilterFactory

Factory for CzechStemFilter.

<fieldType name="text_czstem" class="solr.TextField" positionIncrementGap="100">
  <analyzer>
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.CzechStemFilterFactory"/>
  </analyzer>
</fieldType>

CzechStemmer

Light Stemmer for Czech.

Implements the algorithm described in: Indexing and stemming approaches for the Czech language http://portal.acm.org/citation.cfm?id=1598600