Show / Hide Table of Contents

    Class HindiNormalizer

    Normalizer for Hindi.

    Normalizes text to remove some differences in spelling variations.

    Implements the Hindi-language specific algorithm specified in: Word normalization in Indian languages Prasad Pingali and Vasudeva Varma. http://web2py.iiit.ac.in/publications/default/download/inproceedings.pdf.3fe5b38c-02ee-41ce-9a8f-3e745670be32.pdf

    with the following additions from Hindi CLIR in Thirty Days Leah S. Larkey, Margaret E. Connell, and Nasreen AbdulJaleel. http://maroo.cs.umass.edu/pub/web/getpdf.php?id=454:

    • Internal Zero-width joiner and Zero-width non-joiners are removed
    • In addition to chandrabindu, NA+halant is normalized to anusvara

    Inheritance
    System.Object
    HindiNormalizer
    Namespace: Lucene.Net.Analysis.Hi
    Assembly: Lucene.Net.Analysis.Common.dll
    Syntax
    public class HindiNormalizer : object

    Methods

    | Improve this Doc View Source

    Normalize(Char[], Int32)

    Normalize an input buffer of Hindi text

    Declaration
    public virtual int Normalize(char[] s, int len)
    Parameters
    Type Name Description
    System.Char[] s

    input buffer

    System.Int32 len

    length of input buffer

    Returns
    Type Description
    System.Int32

    length of input buffer after normalization

    • Improve this Doc
    • View Source
    Back to top Copyright © 2020 Licensed to the Apache Software Foundation (ASF)