Class HindiNormalizer
Normalizer for Hindi.
Normalizes text to remove some differences in spelling variations.
Implements the Hindi-language specific algorithm specified in:
Word normalization in Indian languages
Prasad Pingali and Vasudeva Varma.
http://web2py.iiit.ac.in/publications/default/download/inproceedings.pdf.3fe5b38c-02ee-41ce-9a8f-3e745670be32.pdf
with the following additions from Hindi CLIR in Thirty Days
Leah S. Larkey, Margaret E. Connell, and Nasreen AbdulJaleel.
http://maroo.cs.umass.edu/pub/web/getpdf.php?id=454:
- Internal Zero-width joiner and Zero-width non-joiners are removed
- In addition to chandrabindu, NA+halant is normalized to anusvara
Inheritance
System.Object
    HindiNormalizer
  Inherited Members
      System.Object.Equals(System.Object)
    
    
      System.Object.Equals(System.Object, System.Object)
    
    
      System.Object.GetHashCode()
    
    
      System.Object.GetType()
    
    
      System.Object.MemberwiseClone()
    
    
      System.Object.ReferenceEquals(System.Object, System.Object)
    
    
      System.Object.ToString()
    
  Namespace: Lucene.Net.Analysis.Hi
Assembly: Lucene.Net.Analysis.Common.dll
Syntax
public class HindiNormalizerMethods
| Improve this Doc View SourceNormalize(Char[], Int32)
Normalize an input buffer of Hindi text
Declaration
public virtual int Normalize(char[] s, int len)Parameters
| Type | Name | Description | 
|---|---|---|
| System.Char[] | s | input buffer | 
| System.Int32 | len | length of input buffer | 
Returns
| Type | Description | 
|---|---|
| System.Int32 | length of input buffer after normalization |