Fork me on GitHub
  • API

    Show / Hide Table of Contents

    Namespace Lucene.Net.Analysis.Fa

    Analyzer for Persian.

    Classes

    PersianAnalyzer

    Lucene.Net.Analysis.Analyzer for Persian.

    This Analyzer uses PersianCharFilter which implies tokenizing around zero-width non-joiner in addition to whitespace. Some persian-specific variant forms (such as farsi yeh and keheh) are standardized. "Stemming" is accomplished via stopwords.

    PersianCharFilter

    Lucene.Net.Analysis.CharFilter that replaces instances of Zero-width non-joiner with an ordinary space.

    PersianCharFilterFactory

    Factory for PersianCharFilter.

    <fieldType name="text_fa" class="solr.TextField" positionIncrementGap="100">
      <analyzer>
        <charFilter class="solr.PersianCharFilterFactory"/>
        <tokenizer class="solr.StandardTokenizerFactory"/>
      </analyzer>
    </fieldType>

    PersianNormalizationFilter

    A Lucene.Net.Analysis.TokenFilter that applies PersianNormalizer to normalize the orthography.

    PersianNormalizationFilterFactory

    Factory for PersianNormalizationFilter.

    <fieldType name="text_fanormal" class="solr.TextField" positionIncrementGap="100">
      <analyzer>
        <charFilter class="solr.PersianCharFilterFactory"/>
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.PersianNormalizationFilterFactory"/>
      </analyzer>
    </fieldType>

    PersianNormalizer

    Normalizer for Persian.

    Normalization is done in-place for efficiency, operating on a termbuffer.

    Normalization is defined as:

    • Normalization of various heh + hamza forms and heh goal to heh.
    • Normalization of farsi yeh and yeh barree to arabic yeh
    • Normalization of persian keheh to arabic kaf

    PersianStemFilter

    A Lucene.Net.Analysis.TokenFilter that applies PersianStemmer to stem Arabic words..

    To prevent terms from being stemmed use an instance of SetKeywordMarkerFilter or a custom Lucene.Net.Analysis.TokenFilter that sets the Lucene.Net.Analysis.TokenAttributes.IKeywordAttribute before this Lucene.Net.Analysis.TokenStream.

    PersianStemFilterFactory

    Factory for PersianStemFilter.

    <fieldType name="text_arstem" class="solr.TextField" positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.PersianNormalizationFilterFactory"/>
        <filter class="solr.PersianStemFilterFactory"/>
      </analyzer>
    </fieldType>

    PersianStemmer

    Stemmer for Persian.

    Stemming is done in-place for efficiency, operating on a termbuffer.

    Stemming is defined as:
    • Removal of attached definite article, conjunction, and prepositions.
    • Stemming of common suffixes.
    Back to top Copyright © 2024 The Apache Software Foundation, Licensed under the Apache License, Version 2.0
    Apache Lucene.Net, Lucene.Net, Apache, the Apache feather logo, and the Apache Lucene.Net project logo are trademarks of The Apache Software Foundation.
    All other marks mentioned may be trademarks or registered trademarks of their respective owners.