Fork me on GitHub
  • API

    Show / Hide Table of Contents

    Namespace Lucene.Net.Analysis.Ckb

    Analyzer for Sorani Kurdish.

    Classes

    SoraniAnalyzer

    Lucene.Net.Analysis.Analyzer for Sorani Kurdish.

    SoraniNormalizationFilter

    A Lucene.Net.Analysis.TokenFilter that applies SoraniNormalizer to normalize the orthography.

    SoraniNormalizationFilterFactory

    Factory for SoraniNormalizationFilter.

    <fieldType name="text_ckbnormal" class="solr.TextField" positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.SoraniNormalizationFilterFactory"/>
      </analyzer>
    </fieldType>

    SoraniNormalizer

    Normalizes the Unicode representation of Sorani text.

    Normalization consists of:

    • Alternate forms of 'y' (0064, 0649) are converted to 06CC (FARSI YEH)
    • Alternate form of 'k' (0643) is converted to 06A9 (KEHEH)
    • Alternate forms of vowel 'e' (0647+200C, word-final 0647, 0629) are converted to 06D5 (AE)
    • Alternate (joining) form of 'h' (06BE) is converted to 0647
    • Alternate forms of 'rr' (0692, word-initial 0631) are converted to 0695 (REH WITH SMALL V BELOW)
    • Harakat, tatweel, and formatting characters such as directional controls are removed.

    SoraniStemFilter

    A Lucene.Net.Analysis.TokenFilter that applies SoraniStemmer to stem Sorani words.

    To prevent terms from being stemmed use an instance of SetKeywordMarkerFilter or a custom Lucene.Net.Analysis.TokenFilter that sets the KeywordAttribute before this Lucene.Net.Analysis.TokenStream.

    SoraniStemFilterFactory

    Factory for SoraniStemFilter.

    <fieldType name="text_ckbstem" class="solr.TextField" positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.SoraniNormalizationFilterFactory"/>
        <filter class="solr.SoraniStemFilterFactory"/>
      </analyzer>
    </fieldType>

    SoraniStemmer

    Light stemmer for Sorani

    • Improve this Doc
    Back to top Copyright © 2020 The Apache Software Foundation, Licensed under the Apache License, Version 2.0
    Apache Lucene.Net, Lucene.Net, Apache, the Apache feather logo, and the Apache Lucene.Net project logo are trademarks of The Apache Software Foundation.
    All other marks mentioned may be trademarks or registered trademarks of their respective owners.