Fork me on GitHub
  • API

    Show / Hide Table of Contents

    Class SoraniNormalizer

    Normalizes the Unicode representation of Sorani text.

    Normalization consists of:

    • Alternate forms of 'y' (0064, 0649) are converted to 06CC (FARSI YEH)
    • Alternate form of 'k' (0643) is converted to 06A9 (KEHEH)
    • Alternate forms of vowel 'e' (0647+200C, word-final 0647, 0629) are converted to 06D5 (AE)
    • Alternate (joining) form of 'h' (06BE) is converted to 0647
    • Alternate forms of 'rr' (0692, word-initial 0631) are converted to 0695 (REH WITH SMALL V BELOW)
    • Harakat, tatweel, and formatting characters such as directional controls are removed.

    Inheritance
    System.Object
    SoraniNormalizer
    Inherited Members
    System.Object.Equals(System.Object)
    System.Object.Equals(System.Object, System.Object)
    System.Object.GetHashCode()
    System.Object.GetType()
    System.Object.MemberwiseClone()
    System.Object.ReferenceEquals(System.Object, System.Object)
    System.Object.ToString()
    Namespace: Lucene.Net.Analysis.Ckb
    Assembly: Lucene.Net.Analysis.Common.dll
    Syntax
    public class SoraniNormalizer

    Methods

    | Improve this Doc View Source

    Normalize(Char[], Int32)

    Normalize an input buffer of Sorani text

    Declaration
    public virtual int Normalize(char[] s, int len)
    Parameters
    Type Name Description
    System.Char[] s

    input buffer

    System.Int32 len

    length of input buffer

    Returns
    Type Description
    System.Int32

    length of input buffer after normalization

    • Improve this Doc
    • View Source
    Back to top Copyright © 2020 The Apache Software Foundation, Licensed under the Apache License, Version 2.0
    Apache Lucene.Net, Lucene.Net, Apache, the Apache feather logo, and the Apache Lucene.Net project logo are trademarks of The Apache Software Foundation.
    All other marks mentioned may be trademarks or registered trademarks of their respective owners.