Class ArabicNormalizer
Normalizer for Arabic.
Normalization is done in-place for efficiency, operating on a termbuffer.
Normalization is defined as:
- Normalization of hamza with alef seat to a bare alef.
- Normalization of teh marbuta to heh
- Normalization of dotless yeh (alef maksura) to yeh.
- Removal of Arabic diacritics (the harakat)
- Removal of tatweel (stretching character).
Inheritance
System.Object
ArabicNormalizer
Inherited Members
System.Object.Equals(System.Object)
System.Object.Equals(System.Object, System.Object)
System.Object.GetHashCode()
System.Object.GetType()
System.Object.MemberwiseClone()
System.Object.ReferenceEquals(System.Object, System.Object)
System.Object.ToString()
Assembly: Lucene.Net.Analysis.Common.dll
Syntax
public class ArabicNormalizer
Fields
|
Improve this Doc
View Source
ALEF
Declaration
public const char ALEF = 'ا'
Field Value
Type |
Description |
System.Char |
|
|
Improve this Doc
View Source
ALEF_HAMZA_ABOVE
Declaration
public const char ALEF_HAMZA_ABOVE = 'أ'
Field Value
Type |
Description |
System.Char |
|
|
Improve this Doc
View Source
ALEF_HAMZA_BELOW
Declaration
public const char ALEF_HAMZA_BELOW = 'إ'
Field Value
Type |
Description |
System.Char |
|
|
Improve this Doc
View Source
ALEF_MADDA
Declaration
public const char ALEF_MADDA = 'آ'
Field Value
Type |
Description |
System.Char |
|
|
Improve this Doc
View Source
DAMMA
Declaration
public const char DAMMA = 'ُ'
Field Value
Type |
Description |
System.Char |
|
|
Improve this Doc
View Source
DAMMATAN
Declaration
public const char DAMMATAN = 'ٌ'
Field Value
Type |
Description |
System.Char |
|
|
Improve this Doc
View Source
DOTLESS_YEH
Declaration
public const char DOTLESS_YEH = 'ى'
Field Value
Type |
Description |
System.Char |
|
|
Improve this Doc
View Source
FATHA
Declaration
public const char FATHA = 'َ'
Field Value
Type |
Description |
System.Char |
|
|
Improve this Doc
View Source
FATHATAN
Declaration
public const char FATHATAN = 'ً'
Field Value
Type |
Description |
System.Char |
|
|
Improve this Doc
View Source
HEH
Declaration
public const char HEH = 'ه'
Field Value
Type |
Description |
System.Char |
|
|
Improve this Doc
View Source
KASRA
Declaration
public const char KASRA = 'ِ'
Field Value
Type |
Description |
System.Char |
|
|
Improve this Doc
View Source
KASRATAN
Declaration
public const char KASRATAN = 'ٍ'
Field Value
Type |
Description |
System.Char |
|
|
Improve this Doc
View Source
SHADDA
Declaration
public const char SHADDA = 'ّ'
Field Value
Type |
Description |
System.Char |
|
|
Improve this Doc
View Source
SUKUN
Declaration
public const char SUKUN = 'ْ'
Field Value
Type |
Description |
System.Char |
|
|
Improve this Doc
View Source
TATWEEL
Declaration
public const char TATWEEL = 'ـ'
Field Value
Type |
Description |
System.Char |
|
|
Improve this Doc
View Source
TEH_MARBUTA
Declaration
public const char TEH_MARBUTA = 'ة'
Field Value
Type |
Description |
System.Char |
|
|
Improve this Doc
View Source
YEH
Declaration
public const char YEH = 'ي'
Field Value
Type |
Description |
System.Char |
|
Methods
|
Improve this Doc
View Source
Normalize(Char[], Int32)
Normalize an input buffer of Arabic text
Declaration
public virtual int Normalize(char[] s, int len)
Parameters
Type |
Name |
Description |
System.Char[] |
s |
input buffer
|
System.Int32 |
len |
length of input buffer
|
Returns
Type |
Description |
System.Int32 |
length of input buffer after normalization
|