Namespace Lucene.Net.Analysis.Fa
Analyzer for Persian.
Classes
PersianAnalyzer
Lucene.Net.Analysis.Analyzer for Persian.
This Analyzer uses PersianCharFilter which implies tokenizing around zero-width non-joiner in addition to whitespace. Some persian-specific variant forms (such as farsi yeh and keheh) are standardized. "Stemming" is accomplished via stopwords.
PersianCharFilter
Lucene.Net.Analysis.CharFilter that replaces instances of Zero-width non-joiner with an ordinary space.
PersianCharFilterFactory
Factory for PersianCharFilter.
<fieldType name="text_fa" class="solr.TextField" positionIncrementGap="100">
  <analyzer>
    <charFilter class="solr.PersianCharFilterFactory"/>
    <tokenizer class="solr.StandardTokenizerFactory"/>
  </analyzer>
</fieldType>PersianNormalizationFilter
A Lucene.Net.Analysis.TokenFilter that applies PersianNormalizer to normalize the orthography.
PersianNormalizationFilterFactory
Factory for PersianNormalizationFilter.
<fieldType name="text_fanormal" class="solr.TextField" positionIncrementGap="100">
  <analyzer>
    <charFilter class="solr.PersianCharFilterFactory"/>
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.PersianNormalizationFilterFactory"/>
  </analyzer>
</fieldType>PersianNormalizer
Normalizer for Persian.
Normalization is done in-place for efficiency, operating on a termbuffer.
Normalization is defined as:
- Normalization of various heh + hamza forms and heh goal to heh.
- Normalization of farsi yeh and yeh barree to arabic yeh
- Normalization of persian keheh to arabic kaf