Show / Hide Table of Contents

    Namespace Lucene.Net.Analysis.Ckb

    Analyzer for Sorani Kurdish.

    Classes

    SoraniAnalyzer

    Analyzer for Sorani Kurdish.

    SoraniNormalizationFilter

    A TokenFilter that applies SoraniNormalizer to normalize the orthography.

    SoraniNormalizationFilterFactory

    Factory for SoraniNormalizationFilter.

    <fieldType name="text_ckbnormal" class="solr.TextField" positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.SoraniNormalizationFilterFactory"/>
      </analyzer>
    </fieldType>

    SoraniNormalizer

    Normalizes the Unicode representation of Sorani text.

    Normalization consists of:

    • Alternate forms of 'y' (0064, 0649) are converted to 06CC (FARSI YEH)
    • Alternate form of 'k' (0643) is converted to 06A9 (KEHEH)
    • Alternate forms of vowel 'e' (0647+200C, word-final 0647, 0629) are converted to 06D5 (AE)
    • Alternate (joining) form of 'h' (06BE) is converted to 0647
    • Alternate forms of 'rr' (0692, word-initial 0631) are converted to 0695 (REH WITH SMALL V BELOW)
    • Harakat, tatweel, and formatting characters such as directional controls are removed.

    SoraniStemFilter

    A TokenFilter that applies SoraniStemmer to stem Sorani words.

    To prevent terms from being stemmed use an instance of SetKeywordMarkerFilter or a custom TokenFilter that sets the KeywordAttribute before this TokenStream.

    SoraniStemFilterFactory

    Factory for SoraniStemFilter.

    <fieldType name="text_ckbstem" class="solr.TextField" positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.SoraniNormalizationFilterFactory"/>
        <filter class="solr.SoraniStemFilterFactory"/>
      </analyzer>
    </fieldType>

    SoraniStemmer

    Light stemmer for Sorani

    • Improve this Doc
    Back to top Copyright © 2020 Licensed to the Apache Software Foundation (ASF)