Namespace Lucene.Net.Analysis.Cz
Analyzer for Czech.
Classes
CzechAnalyzer
Lucene.Net.Analysis.Analyzer for Czech language.
Supports an external list of stopwords (words that will not be indexed at all). A default set of stopwords is used unless an alternative list is specified.
You must specify the required Lucene.Net.Util.LuceneVersion compatibility when creating CzechAnalyzer:
- As of 3.1, words are stemmed with CzechStemFilter
- As of 2.9, StopFilter preserves position increments
- As of 2.4, Tokens incorrectly identified as acronyms are corrected (see LUCENE-1068)
CzechStemFilter
A Lucene.Net.Analysis.TokenFilter that applies CzechStemmer to stem Czech words.
To prevent terms from being stemmed use an instance of SetKeywordMarkerFilter or a custom Lucene.Net.Analysis.TokenFilter that sets the Lucene.Net.Analysis.TokenAttributes.IKeywordAttribute before this Lucene.Net.Analysis.TokenStream.
NOTE: Input is expected to be in lowercase, but with diacritical marks
CzechStemFilterFactory
Factory for CzechStemFilter.
<fieldType name="text_czstem" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.CzechStemFilterFactory"/>
</analyzer>
</fieldType>
CzechStemmer
Light Stemmer for Czech.
Implements the algorithm described in:
Indexing and stemming approaches for the Czech language
http://portal.acm.org/citation.cfm?id=1598600