Fork me on GitHub
  • API

    Show / Hide Table of Contents

    Class ArabicNormalizer

    Normalizer for Arabic.

    Normalization is done in-place for efficiency, operating on a termbuffer.

    Normalization is defined as:
    • Normalization of hamza with alef seat to a bare alef.
    • Normalization of teh marbuta to heh
    • Normalization of dotless yeh (alef maksura) to yeh.
    • Removal of Arabic diacritics (the harakat)
    • Removal of tatweel (stretching character).
    Inheritance
    object
    ArabicNormalizer
    Inherited Members
    object.Equals(object)
    object.Equals(object, object)
    object.GetHashCode()
    object.GetType()
    object.MemberwiseClone()
    object.ReferenceEquals(object, object)
    object.ToString()
    Namespace: Lucene.Net.Analysis.Ar
    Assembly: Lucene.Net.Analysis.Common.dll
    Syntax
    public class ArabicNormalizer

    Fields

    ALEF

    Normalizer for Arabic.

    Normalization is done in-place for efficiency, operating on a termbuffer.

    Normalization is defined as:
    • Normalization of hamza with alef seat to a bare alef.
    • Normalization of teh marbuta to heh
    • Normalization of dotless yeh (alef maksura) to yeh.
    • Removal of Arabic diacritics (the harakat)
    • Removal of tatweel (stretching character).
    Declaration
    public const char ALEF = 'ا'
    Field Value
    Type Description
    char

    ALEF_HAMZA_ABOVE

    Normalizer for Arabic.

    Normalization is done in-place for efficiency, operating on a termbuffer.

    Normalization is defined as:
    • Normalization of hamza with alef seat to a bare alef.
    • Normalization of teh marbuta to heh
    • Normalization of dotless yeh (alef maksura) to yeh.
    • Removal of Arabic diacritics (the harakat)
    • Removal of tatweel (stretching character).
    Declaration
    public const char ALEF_HAMZA_ABOVE = 'أ'
    Field Value
    Type Description
    char

    ALEF_HAMZA_BELOW

    Normalizer for Arabic.

    Normalization is done in-place for efficiency, operating on a termbuffer.

    Normalization is defined as:
    • Normalization of hamza with alef seat to a bare alef.
    • Normalization of teh marbuta to heh
    • Normalization of dotless yeh (alef maksura) to yeh.
    • Removal of Arabic diacritics (the harakat)
    • Removal of tatweel (stretching character).
    Declaration
    public const char ALEF_HAMZA_BELOW = 'إ'
    Field Value
    Type Description
    char

    ALEF_MADDA

    Normalizer for Arabic.

    Normalization is done in-place for efficiency, operating on a termbuffer.

    Normalization is defined as:
    • Normalization of hamza with alef seat to a bare alef.
    • Normalization of teh marbuta to heh
    • Normalization of dotless yeh (alef maksura) to yeh.
    • Removal of Arabic diacritics (the harakat)
    • Removal of tatweel (stretching character).
    Declaration
    public const char ALEF_MADDA = 'آ'
    Field Value
    Type Description
    char

    DAMMA

    Normalizer for Arabic.

    Normalization is done in-place for efficiency, operating on a termbuffer.

    Normalization is defined as:
    • Normalization of hamza with alef seat to a bare alef.
    • Normalization of teh marbuta to heh
    • Normalization of dotless yeh (alef maksura) to yeh.
    • Removal of Arabic diacritics (the harakat)
    • Removal of tatweel (stretching character).
    Declaration
    public const char DAMMA = 'ُ'
    Field Value
    Type Description
    char

    DAMMATAN

    Normalizer for Arabic.

    Normalization is done in-place for efficiency, operating on a termbuffer.

    Normalization is defined as:
    • Normalization of hamza with alef seat to a bare alef.
    • Normalization of teh marbuta to heh
    • Normalization of dotless yeh (alef maksura) to yeh.
    • Removal of Arabic diacritics (the harakat)
    • Removal of tatweel (stretching character).
    Declaration
    public const char DAMMATAN = 'ٌ'
    Field Value
    Type Description
    char

    DOTLESS_YEH

    Normalizer for Arabic.

    Normalization is done in-place for efficiency, operating on a termbuffer.

    Normalization is defined as:
    • Normalization of hamza with alef seat to a bare alef.
    • Normalization of teh marbuta to heh
    • Normalization of dotless yeh (alef maksura) to yeh.
    • Removal of Arabic diacritics (the harakat)
    • Removal of tatweel (stretching character).
    Declaration
    public const char DOTLESS_YEH = 'ى'
    Field Value
    Type Description
    char

    FATHA

    Normalizer for Arabic.

    Normalization is done in-place for efficiency, operating on a termbuffer.

    Normalization is defined as:
    • Normalization of hamza with alef seat to a bare alef.
    • Normalization of teh marbuta to heh
    • Normalization of dotless yeh (alef maksura) to yeh.
    • Removal of Arabic diacritics (the harakat)
    • Removal of tatweel (stretching character).
    Declaration
    public const char FATHA = 'َ'
    Field Value
    Type Description
    char

    FATHATAN

    Normalizer for Arabic.

    Normalization is done in-place for efficiency, operating on a termbuffer.

    Normalization is defined as:
    • Normalization of hamza with alef seat to a bare alef.
    • Normalization of teh marbuta to heh
    • Normalization of dotless yeh (alef maksura) to yeh.
    • Removal of Arabic diacritics (the harakat)
    • Removal of tatweel (stretching character).
    Declaration
    public const char FATHATAN = 'ً'
    Field Value
    Type Description
    char

    HEH

    Normalizer for Arabic.

    Normalization is done in-place for efficiency, operating on a termbuffer.

    Normalization is defined as:
    • Normalization of hamza with alef seat to a bare alef.
    • Normalization of teh marbuta to heh
    • Normalization of dotless yeh (alef maksura) to yeh.
    • Removal of Arabic diacritics (the harakat)
    • Removal of tatweel (stretching character).
    Declaration
    public const char HEH = 'ه'
    Field Value
    Type Description
    char

    KASRA

    Normalizer for Arabic.

    Normalization is done in-place for efficiency, operating on a termbuffer.

    Normalization is defined as:
    • Normalization of hamza with alef seat to a bare alef.
    • Normalization of teh marbuta to heh
    • Normalization of dotless yeh (alef maksura) to yeh.
    • Removal of Arabic diacritics (the harakat)
    • Removal of tatweel (stretching character).
    Declaration
    public const char KASRA = 'ِ'
    Field Value
    Type Description
    char

    KASRATAN

    Normalizer for Arabic.

    Normalization is done in-place for efficiency, operating on a termbuffer.

    Normalization is defined as:
    • Normalization of hamza with alef seat to a bare alef.
    • Normalization of teh marbuta to heh
    • Normalization of dotless yeh (alef maksura) to yeh.
    • Removal of Arabic diacritics (the harakat)
    • Removal of tatweel (stretching character).
    Declaration
    public const char KASRATAN = 'ٍ'
    Field Value
    Type Description
    char

    SHADDA

    Normalizer for Arabic.

    Normalization is done in-place for efficiency, operating on a termbuffer.

    Normalization is defined as:
    • Normalization of hamza with alef seat to a bare alef.
    • Normalization of teh marbuta to heh
    • Normalization of dotless yeh (alef maksura) to yeh.
    • Removal of Arabic diacritics (the harakat)
    • Removal of tatweel (stretching character).
    Declaration
    public const char SHADDA = 'ّ'
    Field Value
    Type Description
    char

    SUKUN

    Normalizer for Arabic.

    Normalization is done in-place for efficiency, operating on a termbuffer.

    Normalization is defined as:
    • Normalization of hamza with alef seat to a bare alef.
    • Normalization of teh marbuta to heh
    • Normalization of dotless yeh (alef maksura) to yeh.
    • Removal of Arabic diacritics (the harakat)
    • Removal of tatweel (stretching character).
    Declaration
    public const char SUKUN = 'ْ'
    Field Value
    Type Description
    char

    TATWEEL

    Normalizer for Arabic.

    Normalization is done in-place for efficiency, operating on a termbuffer.

    Normalization is defined as:
    • Normalization of hamza with alef seat to a bare alef.
    • Normalization of teh marbuta to heh
    • Normalization of dotless yeh (alef maksura) to yeh.
    • Removal of Arabic diacritics (the harakat)
    • Removal of tatweel (stretching character).
    Declaration
    public const char TATWEEL = 'ـ'
    Field Value
    Type Description
    char

    TEH_MARBUTA

    Normalizer for Arabic.

    Normalization is done in-place for efficiency, operating on a termbuffer.

    Normalization is defined as:
    • Normalization of hamza with alef seat to a bare alef.
    • Normalization of teh marbuta to heh
    • Normalization of dotless yeh (alef maksura) to yeh.
    • Removal of Arabic diacritics (the harakat)
    • Removal of tatweel (stretching character).
    Declaration
    public const char TEH_MARBUTA = 'ة'
    Field Value
    Type Description
    char

    YEH

    Normalizer for Arabic.

    Normalization is done in-place for efficiency, operating on a termbuffer.

    Normalization is defined as:
    • Normalization of hamza with alef seat to a bare alef.
    • Normalization of teh marbuta to heh
    • Normalization of dotless yeh (alef maksura) to yeh.
    • Removal of Arabic diacritics (the harakat)
    • Removal of tatweel (stretching character).
    Declaration
    public const char YEH = 'ي'
    Field Value
    Type Description
    char

    Methods

    Normalize(char[], int)

    Normalize an input buffer of Arabic text

    Declaration
    public virtual int Normalize(char[] s, int len)
    Parameters
    Type Name Description
    char[] s

    input buffer

    int len

    length of input buffer

    Returns
    Type Description
    int

    length of input buffer after normalization

    Back to top Copyright © 2024 The Apache Software Foundation, Licensed under the Apache License, Version 2.0
    Apache Lucene.Net, Lucene.Net, Apache, the Apache feather logo, and the Apache Lucene.Net project logo are trademarks of The Apache Software Foundation.
    All other marks mentioned may be trademarks or registered trademarks of their respective owners.