Namespace Lucene.Net.Analysis.Fa
Analyzer for Persian.
Classes
PersianAnalyzer
Analyzer for Persian.
This Analyzer uses PersianCharFilter which implies tokenizing around zero-width non-joiner in addition to whitespace. Some persian-specific variant forms (such as farsi yeh and keheh) are standardized. "Stemming" is accomplished via stopwords.
PersianCharFilter
CharFilter that replaces instances of Zero-width non-joiner with an ordinary space.
PersianCharFilterFactory
Factory for PersianCharFilter.
<fieldType name="text_fa" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<charFilter class="solr.PersianCharFilterFactory"/>
<tokenizer class="solr.StandardTokenizerFactory"/>
</analyzer>
</fieldType>
PersianNormalizationFilter
A TokenFilter that applies PersianNormalizer to normalize the orthography.
PersianNormalizationFilterFactory
Factory for PersianNormalizationFilter.
<fieldType name="text_fanormal" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<charFilter class="solr.PersianCharFilterFactory"/>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.PersianNormalizationFilterFactory"/>
</analyzer>
</fieldType>
PersianNormalizer
Normalizer for Persian.
Normalization is done in-place for efficiency, operating on a termbuffer.
Normalization is defined as:
- Normalization of various heh + hamza forms and heh goal to heh.
- Normalization of farsi yeh and yeh barree to arabic yeh
- Normalization of persian keheh to arabic kaf