Fork me on GitHub
  • API

    Show / Hide Table of Contents

    Class SoraniNormalizer

    Normalizes the Unicode representation of Sorani text.

    Normalization consists of:
    • Alternate forms of 'y' (0064, 0649) are converted to 06CC (FARSI YEH)
    • Alternate form of 'k' (0643) is converted to 06A9 (KEHEH)
    • Alternate forms of vowel 'e' (0647+200C, word-final 0647, 0629) are converted to 06D5 (AE)
    • Alternate (joining) form of 'h' (06BE) is converted to 0647
    • Alternate forms of 'rr' (0692, word-initial 0631) are converted to 0695 (REH WITH SMALL V BELOW)
    • Harakat, tatweel, and formatting characters such as directional controls are removed.
    Inheritance
    object
    SoraniNormalizer
    Inherited Members
    object.Equals(object)
    object.Equals(object, object)
    object.GetHashCode()
    object.GetType()
    object.MemberwiseClone()
    object.ReferenceEquals(object, object)
    object.ToString()
    Namespace: Lucene.Net.Analysis.Ckb
    Assembly: Lucene.Net.Analysis.Common.dll
    Syntax
    public class SoraniNormalizer

    Methods

    Normalize(char[], int)

    Normalize an input buffer of Sorani text

    Declaration
    public virtual int Normalize(char[] s, int len)
    Parameters
    Type Name Description
    char[] s

    input buffer

    int len

    length of input buffer

    Returns
    Type Description
    int

    length of input buffer after normalization

    Back to top Copyright © 2024 The Apache Software Foundation, Licensed under the Apache License, Version 2.0
    Apache Lucene.Net, Lucene.Net, Apache, the Apache feather logo, and the Apache Lucene.Net project logo are trademarks of The Apache Software Foundation.
    All other marks mentioned may be trademarks or registered trademarks of their respective owners.