Show / Hide Table of Contents

    Namespace Lucene.Net.Analysis.Fa

    Analyzer for Persian.

    Classes

    PersianAnalyzer

    Analyzer for Persian.

    This Analyzer uses PersianCharFilter which implies tokenizing around zero-width non-joiner in addition to whitespace. Some persian-specific variant forms (such as farsi yeh and keheh) are standardized. "Stemming" is accomplished via stopwords.

    PersianCharFilter

    CharFilter that replaces instances of Zero-width non-joiner with an ordinary space.

    PersianCharFilterFactory

    Factory for PersianCharFilter.

    <fieldType name="text_fa" class="solr.TextField" positionIncrementGap="100">
      <analyzer>
        <charFilter class="solr.PersianCharFilterFactory"/>
        <tokenizer class="solr.StandardTokenizerFactory"/>
      </analyzer>
    </fieldType>

    PersianNormalizationFilter

    A TokenFilter that applies PersianNormalizer to normalize the orthography.

    PersianNormalizationFilterFactory

    Factory for PersianNormalizationFilter.

    <fieldType name="text_fanormal" class="solr.TextField" positionIncrementGap="100">
      <analyzer>
        <charFilter class="solr.PersianCharFilterFactory"/>
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.PersianNormalizationFilterFactory"/>
      </analyzer>
    </fieldType>

    PersianNormalizer

    Normalizer for Persian.

    Normalization is done in-place for efficiency, operating on a termbuffer.

    Normalization is defined as:

    • Normalization of various heh + hamza forms and heh goal to heh.
    • Normalization of farsi yeh and yeh barree to arabic yeh
    • Normalization of persian keheh to arabic kaf

    • Improve this Doc
    Back to top Copyright © 2020 Licensed to the Apache Software Foundation (ASF)