Class ArabicNormalizer
Normalizer for Arabic.
Normalization is done in-place for efficiency, operating on a termbuffer. Normalization is defined as:- Normalization of hamza with alef seat to a bare alef.
- Normalization of teh marbuta to heh
- Normalization of dotless yeh (alef maksura) to yeh.
- Removal of Arabic diacritics (the harakat)
- Removal of tatweel (stretching character).
Inherited Members
Namespace: Lucene.Net.Analysis.Ar
Assembly: Lucene.Net.Analysis.Common.dll
Syntax
public class ArabicNormalizer
Fields
ALEF
Normalizer for Arabic.
Normalization is done in-place for efficiency, operating on a termbuffer. Normalization is defined as:- Normalization of hamza with alef seat to a bare alef.
- Normalization of teh marbuta to heh
- Normalization of dotless yeh (alef maksura) to yeh.
- Removal of Arabic diacritics (the harakat)
- Removal of tatweel (stretching character).
Declaration
public const char ALEF = 'ا'
Field Value
Type | Description |
---|---|
char |
ALEF_HAMZA_ABOVE
Normalizer for Arabic.
Normalization is done in-place for efficiency, operating on a termbuffer. Normalization is defined as:- Normalization of hamza with alef seat to a bare alef.
- Normalization of teh marbuta to heh
- Normalization of dotless yeh (alef maksura) to yeh.
- Removal of Arabic diacritics (the harakat)
- Removal of tatweel (stretching character).
Declaration
public const char ALEF_HAMZA_ABOVE = 'أ'
Field Value
Type | Description |
---|---|
char |
ALEF_HAMZA_BELOW
Normalizer for Arabic.
Normalization is done in-place for efficiency, operating on a termbuffer. Normalization is defined as:- Normalization of hamza with alef seat to a bare alef.
- Normalization of teh marbuta to heh
- Normalization of dotless yeh (alef maksura) to yeh.
- Removal of Arabic diacritics (the harakat)
- Removal of tatweel (stretching character).
Declaration
public const char ALEF_HAMZA_BELOW = 'إ'
Field Value
Type | Description |
---|---|
char |
ALEF_MADDA
Normalizer for Arabic.
Normalization is done in-place for efficiency, operating on a termbuffer. Normalization is defined as:- Normalization of hamza with alef seat to a bare alef.
- Normalization of teh marbuta to heh
- Normalization of dotless yeh (alef maksura) to yeh.
- Removal of Arabic diacritics (the harakat)
- Removal of tatweel (stretching character).
Declaration
public const char ALEF_MADDA = 'آ'
Field Value
Type | Description |
---|---|
char |
DAMMA
Normalizer for Arabic.
Normalization is done in-place for efficiency, operating on a termbuffer. Normalization is defined as:- Normalization of hamza with alef seat to a bare alef.
- Normalization of teh marbuta to heh
- Normalization of dotless yeh (alef maksura) to yeh.
- Removal of Arabic diacritics (the harakat)
- Removal of tatweel (stretching character).
Declaration
public const char DAMMA = 'ُ'
Field Value
Type | Description |
---|---|
char |
DAMMATAN
Normalizer for Arabic.
Normalization is done in-place for efficiency, operating on a termbuffer. Normalization is defined as:- Normalization of hamza with alef seat to a bare alef.
- Normalization of teh marbuta to heh
- Normalization of dotless yeh (alef maksura) to yeh.
- Removal of Arabic diacritics (the harakat)
- Removal of tatweel (stretching character).
Declaration
public const char DAMMATAN = 'ٌ'
Field Value
Type | Description |
---|---|
char |
DOTLESS_YEH
Normalizer for Arabic.
Normalization is done in-place for efficiency, operating on a termbuffer. Normalization is defined as:- Normalization of hamza with alef seat to a bare alef.
- Normalization of teh marbuta to heh
- Normalization of dotless yeh (alef maksura) to yeh.
- Removal of Arabic diacritics (the harakat)
- Removal of tatweel (stretching character).
Declaration
public const char DOTLESS_YEH = 'ى'
Field Value
Type | Description |
---|---|
char |
FATHA
Normalizer for Arabic.
Normalization is done in-place for efficiency, operating on a termbuffer. Normalization is defined as:- Normalization of hamza with alef seat to a bare alef.
- Normalization of teh marbuta to heh
- Normalization of dotless yeh (alef maksura) to yeh.
- Removal of Arabic diacritics (the harakat)
- Removal of tatweel (stretching character).
Declaration
public const char FATHA = 'َ'
Field Value
Type | Description |
---|---|
char |
FATHATAN
Normalizer for Arabic.
Normalization is done in-place for efficiency, operating on a termbuffer. Normalization is defined as:- Normalization of hamza with alef seat to a bare alef.
- Normalization of teh marbuta to heh
- Normalization of dotless yeh (alef maksura) to yeh.
- Removal of Arabic diacritics (the harakat)
- Removal of tatweel (stretching character).
Declaration
public const char FATHATAN = 'ً'
Field Value
Type | Description |
---|---|
char |
HEH
Normalizer for Arabic.
Normalization is done in-place for efficiency, operating on a termbuffer. Normalization is defined as:- Normalization of hamza with alef seat to a bare alef.
- Normalization of teh marbuta to heh
- Normalization of dotless yeh (alef maksura) to yeh.
- Removal of Arabic diacritics (the harakat)
- Removal of tatweel (stretching character).
Declaration
public const char HEH = 'ه'
Field Value
Type | Description |
---|---|
char |
KASRA
Normalizer for Arabic.
Normalization is done in-place for efficiency, operating on a termbuffer. Normalization is defined as:- Normalization of hamza with alef seat to a bare alef.
- Normalization of teh marbuta to heh
- Normalization of dotless yeh (alef maksura) to yeh.
- Removal of Arabic diacritics (the harakat)
- Removal of tatweel (stretching character).
Declaration
public const char KASRA = 'ِ'
Field Value
Type | Description |
---|---|
char |
KASRATAN
Normalizer for Arabic.
Normalization is done in-place for efficiency, operating on a termbuffer. Normalization is defined as:- Normalization of hamza with alef seat to a bare alef.
- Normalization of teh marbuta to heh
- Normalization of dotless yeh (alef maksura) to yeh.
- Removal of Arabic diacritics (the harakat)
- Removal of tatweel (stretching character).
Declaration
public const char KASRATAN = 'ٍ'
Field Value
Type | Description |
---|---|
char |
SHADDA
Normalizer for Arabic.
Normalization is done in-place for efficiency, operating on a termbuffer. Normalization is defined as:- Normalization of hamza with alef seat to a bare alef.
- Normalization of teh marbuta to heh
- Normalization of dotless yeh (alef maksura) to yeh.
- Removal of Arabic diacritics (the harakat)
- Removal of tatweel (stretching character).
Declaration
public const char SHADDA = 'ّ'
Field Value
Type | Description |
---|---|
char |
SUKUN
Normalizer for Arabic.
Normalization is done in-place for efficiency, operating on a termbuffer. Normalization is defined as:- Normalization of hamza with alef seat to a bare alef.
- Normalization of teh marbuta to heh
- Normalization of dotless yeh (alef maksura) to yeh.
- Removal of Arabic diacritics (the harakat)
- Removal of tatweel (stretching character).
Declaration
public const char SUKUN = 'ْ'
Field Value
Type | Description |
---|---|
char |
TATWEEL
Normalizer for Arabic.
Normalization is done in-place for efficiency, operating on a termbuffer. Normalization is defined as:- Normalization of hamza with alef seat to a bare alef.
- Normalization of teh marbuta to heh
- Normalization of dotless yeh (alef maksura) to yeh.
- Removal of Arabic diacritics (the harakat)
- Removal of tatweel (stretching character).
Declaration
public const char TATWEEL = 'ـ'
Field Value
Type | Description |
---|---|
char |
TEH_MARBUTA
Normalizer for Arabic.
Normalization is done in-place for efficiency, operating on a termbuffer. Normalization is defined as:- Normalization of hamza with alef seat to a bare alef.
- Normalization of teh marbuta to heh
- Normalization of dotless yeh (alef maksura) to yeh.
- Removal of Arabic diacritics (the harakat)
- Removal of tatweel (stretching character).
Declaration
public const char TEH_MARBUTA = 'ة'
Field Value
Type | Description |
---|---|
char |
YEH
Normalizer for Arabic.
Normalization is done in-place for efficiency, operating on a termbuffer. Normalization is defined as:- Normalization of hamza with alef seat to a bare alef.
- Normalization of teh marbuta to heh
- Normalization of dotless yeh (alef maksura) to yeh.
- Removal of Arabic diacritics (the harakat)
- Removal of tatweel (stretching character).
Declaration
public const char YEH = 'ي'
Field Value
Type | Description |
---|---|
char |
Methods
Normalize(char[], int)
Normalize an input buffer of Arabic text
Declaration
public virtual int Normalize(char[] s, int len)
Parameters
Type | Name | Description |
---|---|---|
char[] | s | input buffer |
int | len | length of input buffer |
Returns
Type | Description |
---|---|
int | length of input buffer after normalization |