Fork me on GitHub
  • API

    Show / Hide Table of Contents

    Namespace Lucene.Net.Analysis.Fa

    Analyzer for Persian.

    Classes

    PersianAnalyzer

    Lucene.Net.Analysis.Analyzer for Persian.

    This Analyzer uses PersianCharFilter which implies tokenizing around zero-width non-joiner in addition to whitespace. Some persian-specific variant forms (such as farsi yeh and keheh) are standardized. "Stemming" is accomplished via stopwords.

    PersianCharFilter

    Lucene.Net.Analysis.CharFilter that replaces instances of Zero-width non-joiner with an ordinary space.

    PersianCharFilterFactory

    Factory for PersianCharFilter.

    <fieldType name="text_fa" class="solr.TextField" positionIncrementGap="100">
      <analyzer>
        <charFilter class="solr.PersianCharFilterFactory"/>
        <tokenizer class="solr.StandardTokenizerFactory"/>
      </analyzer>
    </fieldType>

    PersianNormalizationFilter

    A Lucene.Net.Analysis.TokenFilter that applies PersianNormalizer to normalize the orthography.

    PersianNormalizationFilterFactory

    Factory for PersianNormalizationFilter.

    <fieldType name="text_fanormal" class="solr.TextField" positionIncrementGap="100">
      <analyzer>
        <charFilter class="solr.PersianCharFilterFactory"/>
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.PersianNormalizationFilterFactory"/>
      </analyzer>
    </fieldType>

    PersianNormalizer

    Normalizer for Persian.

    Normalization is done in-place for efficiency, operating on a termbuffer.

    Normalization is defined as:

    • Normalization of various heh + hamza forms and heh goal to heh.
    • Normalization of farsi yeh and yeh barree to arabic yeh
    • Normalization of persian keheh to arabic kaf

    • Improve this Doc
    Back to top Copyright © 2020 The Apache Software Foundation, Licensed under the Apache License, Version 2.0
    Apache Lucene.Net, Lucene.Net, Apache, the Apache feather logo, and the Apache Lucene.Net project logo are trademarks of The Apache Software Foundation.
    All other marks mentioned may be trademarks or registered trademarks of their respective owners.