Namespace Lucene.Net.Analysis.Fa
Analyzer for Persian.
Classes
PersianAnalyzer
Lucene.Net.Analysis.Analyzer for Persian.
This Analyzer uses PersianCharFilter which implies tokenizing around zero-width non-joiner in addition to whitespace. Some persian-specific variant forms (such as farsi yeh and keheh) are standardized. "Stemming" is accomplished via stopwords.
PersianCharFilter
Lucene.Net.Analysis.CharFilter that replaces instances of Zero-width non-joiner with an ordinary space.
PersianCharFilterFactory
Factory for PersianCharFilter.
<fieldType name="text_fa" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<charFilter class="solr.PersianCharFilterFactory"/>
<tokenizer class="solr.StandardTokenizerFactory"/>
</analyzer>
</fieldType>
PersianNormalizationFilter
A Lucene.Net.Analysis.TokenFilter that applies PersianNormalizer to normalize the orthography.
PersianNormalizationFilterFactory
Factory for PersianNormalizationFilter.
<fieldType name="text_fanormal" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<charFilter class="solr.PersianCharFilterFactory"/>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.PersianNormalizationFilterFactory"/>
</analyzer>
</fieldType>
PersianNormalizer
Normalizer for Persian.
Normalization is done in-place for efficiency, operating on a termbuffer.
Normalization is defined as:
- Normalization of various heh + hamza forms and heh goal to heh.
- Normalization of farsi yeh and yeh barree to arabic yeh
- Normalization of persian keheh to arabic kaf
PersianStemFilter
A Lucene.Net.Analysis.TokenFilter that applies PersianStemmer to stem Arabic words..
To prevent terms from being stemmed use an instance of SetKeywordMarkerFilter or a custom Lucene.Net.Analysis.TokenFilter that sets the Lucene.Net.Analysis.TokenAttributes.IKeywordAttribute before this Lucene.Net.Analysis.TokenStream.PersianStemFilterFactory
Factory for PersianStemFilter.
<fieldType name="text_arstem" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.PersianNormalizationFilterFactory"/>
<filter class="solr.PersianStemFilterFactory"/>
</analyzer>
</fieldType>
PersianStemmer
Stemmer for Persian.
Stemming is done in-place for efficiency, operating on a termbuffer. Stemming is defined as:- Removal of attached definite article, conjunction, and prepositions.
- Stemming of common suffixes.