Namespace Lucene.Net.Analysis.Ckb
Analyzer for Sorani Kurdish.
Classes
SoraniAnalyzer
Analyzer for Sorani Kurdish.
SoraniNormalizationFilter
A TokenFilter that applies SoraniNormalizer to normalize the orthography.
SoraniNormalizationFilterFactory
Factory for SoraniNormalizationFilter.
<fieldType name="text_ckbnormal" class="solr.TextField" positionIncrementGap="100">
  <analyzer>
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.SoraniNormalizationFilterFactory"/>
  </analyzer>
</fieldType>
SoraniNormalizer
Normalizes the Unicode representation of Sorani text.
Normalization consists of:
- Alternate forms of 'y' (0064, 0649) are converted to 06CC (FARSI YEH)
 - Alternate form of 'k' (0643) is converted to 06A9 (KEHEH)
 - Alternate forms of vowel 'e' (0647+200C, word-final 0647, 0629) are converted to 06D5 (AE)
 - Alternate (joining) form of 'h' (06BE) is converted to 0647
 - Alternate forms of 'rr' (0692, word-initial 0631) are converted to 0695 (REH WITH SMALL V BELOW)
 - Harakat, tatweel, and formatting characters such as directional controls are removed.
 
SoraniStemFilter
A TokenFilter that applies SoraniStemmer to stem Sorani words.
To prevent terms from being stemmed use an instance of SetKeywordMarkerFilter or a custom TokenFilter that sets the KeywordAttribute before this TokenStream.
SoraniStemFilterFactory
Factory for SoraniStemFilter.
<fieldType name="text_ckbstem" class="solr.TextField" positionIncrementGap="100">
  <analyzer>
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.SoraniNormalizationFilterFactory"/>
    <filter class="solr.SoraniStemFilterFactory"/>
  </analyzer>
</fieldType>
SoraniStemmer
Light stemmer for Sorani