Namespace Lucene.Net.Analysis.Hi
Analyzer for Hindi.
Classes
HindiAnalyzer
Analyzer for Hindi.
You must specify the required Lucene.
- As of 3.6, StandardTokenizer is used for tokenization
HindiNormalizationFilter
A Lucene.
In some cases the normalization may cause unrelated terms to conflate, so
to prevent terms from being normalized use an instance of
Set
HindiNormalizationFilterFactory
Factory for Hindi
<fieldType name="text_hinormal" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.HindiNormalizationFilterFactory"/>
</analyzer>
</fieldType>
HindiNormalizer
Normalizer for Hindi.
Normalizes text to remove some differences in spelling variations.
Implements the Hindi-language specific algorithm specified in:
Word normalization in Indian languages
Prasad Pingali and Vasudeva Varma.
http://web2py.iiit.ac.in/publications/default/download/inproceedings.pdf.3fe5b38c-02ee-41ce-9a8f-3e745670be32.pdf
with the following additions from Hindi CLIR in Thirty Days
Leah S. Larkey, Margaret E. Connell, and Nasreen AbdulJaleel.
http://maroo.cs.umass.edu/pub/web/getpdf.php?id=454:
- Internal Zero-width joiner and Zero-width non-joiners are removed
- In addition to chandrabindu, NA+halant is normalized to anusvara
HindiStemFilter
A Lucene.
HindiStemFilterFactory
Factory for Hindi
<fieldType name="text_histem" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.HindiStemFilterFactory"/>
</analyzer>
</fieldType>
HindiStemmer
Light Stemmer for Hindi.
Implements the algorithm specified in:
A Lightweight Stemmer for Hindi
Ananthakrishnan Ramanathan and Durgesh D Rao.
http://computing.open.ac.uk/Sites/EACLSouthAsia/Papers/p6-Ramanathan.pdf