Namespace Lucene.Net.Analysis.Hi
Analyzer for Hindi.
Classes
HindiAnalyzer
Analyzer for Hindi.
You must specify the required Lucene.Net.Util.LuceneVersion compatibility when creating HindiAnalyzer:
- As of 3.6, StandardTokenizer is used for tokenization
HindiNormalizationFilter
A Lucene.Net.Analysis.TokenFilter that applies HindiNormalizer to normalize the orthography.
In some cases the normalization may cause unrelated terms to conflate, so to prevent terms from being normalized use an instance of SetKeywordMarkerFilter or a custom Lucene.Net.Analysis.TokenFilter that sets the Lucene.Net.Analysis.TokenAttributes.KeywordAttribute before this Lucene.Net.Analysis.TokenStream.
HindiNormalizationFilterFactory
Factory for HindiNormalizationFilter.
<fieldType name="text_hinormal" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.HindiNormalizationFilterFactory"/>
</analyzer>
</fieldType>
HindiNormalizer
Normalizer for Hindi.
Normalizes text to remove some differences in spelling variations.
Implements the Hindi-language specific algorithm specified in:
Word normalization in Indian languages
Prasad Pingali and Vasudeva Varma.
http://web2py.iiit.ac.in/publications/default/download/inproceedings.pdf.3fe5b38c-02ee-41ce-9a8f-3e745670be32.pdf
with the following additions from Hindi CLIR in Thirty Days
Leah S. Larkey, Margaret E. Connell, and Nasreen AbdulJaleel.
http://maroo.cs.umass.edu/pub/web/getpdf.php?id=454:
- Internal Zero-width joiner and Zero-width non-joiners are removed
- In addition to chandrabindu, NA+halant is normalized to anusvara
HindiStemFilter
A Lucene.Net.Analysis.TokenFilter that applies HindiStemmer to stem Hindi words.
HindiStemFilterFactory
Factory for HindiStemFilter.
<fieldType name="text_histem" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.HindiStemFilterFactory"/>
</analyzer>
</fieldType>
HindiStemmer
Light Stemmer for Hindi.
Implements the algorithm specified in:
A Lightweight Stemmer for Hindi
Ananthakrishnan Ramanathan and Durgesh D Rao.
http://computing.open.ac.uk/Sites/EACLSouthAsia/Papers/p6-Ramanathan.pdf