Namespace Lucene.Net.Analysis.Hi

Analyzer for Hindi.

Classes

HindiAnalyzer

Analyzer for Hindi.

You must specify the required Lucene.Net.Util.LuceneVersion compatibility when creating HindiAnalyzer:

As of 3.6, StandardTokenizer is used for tokenization

HindiNormalizationFilter

A Lucene.Net.Analysis.TokenFilter that applies HindiNormalizer to normalize the orthography.

In some cases the normalization may cause unrelated terms to conflate, so to prevent terms from being normalized use an instance of SetKeywordMarkerFilter or a custom Lucene.Net.Analysis.TokenFilter that sets the Lucene.Net.Analysis.TokenAttributes.IKeywordAttribute before this Lucene.Net.Analysis.TokenStream.

HindiNormalizationFilterFactory

Factory for HindiNormalizationFilter.

<fieldType name="text_hinormal" class="solr.TextField" positionIncrementGap="100">
  <analyzer>
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.HindiNormalizationFilterFactory"/>
  </analyzer>
</fieldType>

HindiNormalizer

Normalizer for Hindi.

Normalizes text to remove some differences in spelling variations.

Implements the Hindi-language specific algorithm specified in: Word normalization in Indian languages Prasad Pingali and Vasudeva Varma. http://web2py.iiit.ac.in/publications/default/download/inproceedings.pdf.3fe5b38c-02ee-41ce-9a8f-3e745670be32.pdf

with the following additions from Hindi CLIR in Thirty Days Leah S. Larkey, Margaret E. Connell, and Nasreen AbdulJaleel. http://maroo.cs.umass.edu/pub/web/getpdf.php?id=454:

Internal Zero-width joiner and Zero-width non-joiners are removed
In addition to chandrabindu, NA+halant is normalized to anusvara

HindiStemFilter

A Lucene.Net.Analysis.TokenFilter that applies HindiStemmer to stem Hindi words.

HindiStemFilterFactory

Factory for HindiStemFilter.

<fieldType name="text_histem" class="solr.TextField" positionIncrementGap="100">
  <analyzer>
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.HindiStemFilterFactory"/>
  </analyzer>
</fieldType>

HindiStemmer

Light Stemmer for Hindi.

Implements the algorithm specified in: A Lightweight Stemmer for Hindi Ananthakrishnan Ramanathan and Durgesh D Rao. http://computing.open.ac.uk/Sites/EACLSouthAsia/Papers/p6-Ramanathan.pdf