Show / Hide Table of Contents

    Namespace Lucene.Net.Analysis.In

    Analysis components for Indian languages.

    Classes

    IndicNormalizationFilter

    A TokenFilter that applies IndicNormalizer to normalize text in Indian Languages.

    IndicNormalizationFilterFactory

    Factory for IndicNormalizationFilter.

    <fieldType name="text_innormal" class="solr.TextField" positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.IndicNormalizationFilterFactory"/>
      </analyzer>
    </fieldType>

    IndicNormalizer

    Normalizes the Unicode representation of text in Indian languages.

    Follows guidelines from Unicode 5.2, chapter 6, South Asian Scripts I and graphical decompositions from http://ldc.upenn.edu/myl/IndianScriptsUnicode.html

    IndicTokenizer

    Simple Tokenizer for text in Indian Languages.

    • Improve this Doc
    Back to top Copyright © 2020 Licensed to the Apache Software Foundation (ASF)