Class SoraniNormalizer
Normalizes the Unicode representation of Sorani text.
Normalization consists of:
- Alternate forms of 'y' (0064, 0649) are converted to 06CC (FARSI YEH)
- Alternate form of 'k' (0643) is converted to 06A9 (KEHEH)
- Alternate forms of vowel 'e' (0647+200C, word-final 0647, 0629) are converted to 06D5 (AE)
- Alternate (joining) form of 'h' (06BE) is converted to 0647
- Alternate forms of 'rr' (0692, word-initial 0631) are converted to 0695 (REH WITH SMALL V BELOW)
- Harakat, tatweel, and formatting characters such as directional controls are removed.
Inheritance
System.Object
    SoraniNormalizer
  Inherited Members
      System.Object.Equals(System.Object)
    
    
      System.Object.Equals(System.Object, System.Object)
    
    
      System.Object.GetHashCode()
    
    
      System.Object.GetType()
    
    
      System.Object.MemberwiseClone()
    
    
      System.Object.ReferenceEquals(System.Object, System.Object)
    
    
      System.Object.ToString()
    
  Namespace: Lucene.Net.Analysis.Ckb
Assembly: Lucene.Net.Analysis.Common.dll
Syntax
public class SoraniNormalizerMethods
| Improve this Doc View SourceNormalize(Char[], Int32)
Normalize an input buffer of Sorani text
Declaration
public virtual int Normalize(char[] s, int len)Parameters
| Type | Name | Description | 
|---|---|---|
| System.Char[] | s | input buffer | 
| System.Int32 | len | length of input buffer | 
Returns
| Type | Description | 
|---|---|
| System.Int32 | length of input buffer after normalization |