Class HindiNormalizer

Normalizer for Hindi.

Normalizes text to remove some differences in spelling variations.

Implements the Hindi-language specific algorithm specified in: Word normalization in Indian languages Prasad Pingali and Vasudeva Varma. http://web2py.iiit.ac.in/publications/default/download/inproceedings.pdf.3fe5b38c-02ee-41ce-9a8f-3e745670be32.pdf

with the following additions from Hindi CLIR in Thirty Days Leah S. Larkey, Margaret E. Connell, and Nasreen AbdulJaleel. http://maroo.cs.umass.edu/pub/web/getpdf.php?id=454:

Internal Zero-width joiner and Zero-width non-joiners are removed
In addition to chandrabindu, NA+halant is normalized to anusvara

Inheritance

object

HindiNormalizer

Inherited Members

object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Namespace: Lucene.Net.Analysis.Hi

Assembly: Lucene.Net.Analysis.Common.dll

Syntax

public class HindiNormalizer

Methods

Normalize(char[], int)

Normalize an input buffer of Hindi text

Declaration

public virtual int Normalize(char[] s, int len)

Parameters

Type	Name	Description
char[]	s	input buffer
int	len	length of input buffer

Returns

Type	Description
int	length of input buffer after normalization