Fork me on GitHub
  • API

    Show / Hide Table of Contents

    Namespace Lucene.Net.Analysis.Hi

    Analyzer for Hindi.

    Classes

    HindiAnalyzer

    Analyzer for Hindi.

    You must specify the required Lucene.Net.Util.LuceneVersion compatibility when creating HindiAnalyzer:

    • As of 3.6, StandardTokenizer is used for tokenization

    HindiNormalizationFilter

    A Lucene.Net.Analysis.TokenFilter that applies HindiNormalizer to normalize the orthography.

    In some cases the normalization may cause unrelated terms to conflate, so to prevent terms from being normalized use an instance of SetKeywordMarkerFilter or a custom Lucene.Net.Analysis.TokenFilter that sets the KeywordAttribute before this Lucene.Net.Analysis.TokenStream.

    HindiNormalizationFilterFactory

    Factory for HindiNormalizationFilter.

    <fieldType name="text_hinormal" class="solr.TextField" positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.HindiNormalizationFilterFactory"/>
      </analyzer>
    </fieldType>

    HindiNormalizer

    Normalizer for Hindi.

    Normalizes text to remove some differences in spelling variations.

    Implements the Hindi-language specific algorithm specified in: Word normalization in Indian languages Prasad Pingali and Vasudeva Varma. http://web2py.iiit.ac.in/publications/default/download/inproceedings.pdf.3fe5b38c-02ee-41ce-9a8f-3e745670be32.pdf

    with the following additions from Hindi CLIR in Thirty Days Leah S. Larkey, Margaret E. Connell, and Nasreen AbdulJaleel. http://maroo.cs.umass.edu/pub/web/getpdf.php?id=454:

    • Internal Zero-width joiner and Zero-width non-joiners are removed
    • In addition to chandrabindu, NA+halant is normalized to anusvara

    HindiStemFilter

    A Lucene.Net.Analysis.TokenFilter that applies HindiStemmer to stem Hindi words.

    HindiStemFilterFactory

    Factory for HindiStemFilter.

    <fieldType name="text_histem" class="solr.TextField" positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.HindiStemFilterFactory"/>
      </analyzer>
    </fieldType>

    HindiStemmer

    Light Stemmer for Hindi.

    Implements the algorithm specified in: A Lightweight Stemmer for Hindi Ananthakrishnan Ramanathan and Durgesh D Rao. http://computing.open.ac.uk/Sites/EACLSouthAsia/Papers/p6-Ramanathan.pdf

    • Improve this Doc
    Back to top Copyright © 2020 The Apache Software Foundation, Licensed under the Apache License, Version 2.0
    Apache Lucene.Net, Lucene.Net, Apache, the Apache feather logo, and the Apache Lucene.Net project logo are trademarks of The Apache Software Foundation.
    All other marks mentioned may be trademarks or registered trademarks of their respective owners.