Namespace Lucene.Net.Analysis.Ru
Analyzer for Russian.
Classes
RussianAnalyzer
Lucene.Net.Analysis.Analyzer for Russian language.
Supports an external list of stopwords (words that will not be indexed at all). A default set of stopwords is used unless an alternative list is specified.
You must specify the required Lucene.Net.Util.LuceneVersion compatibility when creating RussianAnalyzer:
- As of 3.1, StandardTokenizer is used, Snowball stemming is done with SnowballFilter, and Snowball stopwords are used by default.
RussianLetterTokenizer
A RussianLetterTokenizer is a Lucene.Net.Analysis.Tokenizer that extends LetterTokenizer by also allowing the basic Latin digits 0-9.
You must specify the required Lucene.Net.Util.LuceneVersion compatibility when creating RussianLetterTokenizer:
- As of 3.1, CharTokenizer uses an int based API to normalize and detect token characters. See IsTokenChar(Int32) and Normalize(Int32) for details.
RussianLetterTokenizerFactory
RussianLightStemFilter
A Lucene.Net.Analysis.TokenFilter that applies RussianLightStemmer to stem Russian words.
To prevent terms from being stemmed use an instance of SetKeywordMarkerFilter or a custom Lucene.Net.Analysis.TokenFilter that sets the Lucene.Net.Analysis.TokenAttributes.KeywordAttribute before this Lucene.Net.Analysis.TokenStream.
RussianLightStemFilterFactory
Factory for RussianLightStemFilter.
<fieldType name="text_rulgtstem" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.RussianLightStemFilterFactory"/>
</analyzer>
</fieldType>
RussianLightStemmer
Light Stemmer for Russian.
This stemmer implements the following algorithm:
Indexing and Searching Strategies for the Russian Language.
Ljiljana Dolamic and Jacques Savoy.