Class ArabicLetterTokenizer

Tokenizer that breaks text into runs of letters and diacritics.

The problem with the standard Letter tokenizer is that it fails on diacritics. Handling similar to this is necessary for Indic Scripts, Hebrew, Thaana, etc.

You must specify the required LuceneVersion compatibility when creating ArabicLetterTokenizer:

As of 3.1, CharTokenizer uses an int based API to normalize and detect token characters. See IsTokenChar(Int32) and Normalize(Int32) for details.

Inheritance

System.Object

ArabicLetterTokenizer

Implements

IDisposable

Inherited Members

CharTokenizer.Normalize(Int32)

CharTokenizer.IncrementToken()

CharTokenizer.End()

CharTokenizer.Reset()

Tokenizer.m_input

Tokenizer.Dispose(Boolean)

Tokenizer.CorrectOffset(Int32)

Tokenizer.SetReader(TextReader)

TokenStream.Dispose()

AttributeSource.GetAttributeFactory()

AttributeSource.GetAttributeClassesEnumerator()

AttributeSource.GetAttributeImplsEnumerator()

AttributeSource.AddAttributeImpl(Attribute)

AttributeSource.AddAttribute<T>()

AttributeSource.HasAttributes

AttributeSource.HasAttribute<T>()

AttributeSource.GetAttribute<T>()

AttributeSource.ClearAttributes()

AttributeSource.CaptureState()

AttributeSource.RestoreState(AttributeSource.State)

AttributeSource.GetHashCode()

AttributeSource.Equals(Object)

AttributeSource.ReflectAsString(Boolean)

AttributeSource.ReflectWith(IAttributeReflector)

AttributeSource.CloneAttributes()

AttributeSource.CopyTo(AttributeSource)

AttributeSource.ToString()

Namespace: Lucene.Net.Analysis.Ar

Assembly: Lucene.Net.Analysis.Common.dll

Syntax

public class ArabicLetterTokenizer : LetterTokenizer, IDisposable

Constructors

| Improve this Doc View Source

ArabicLetterTokenizer(LuceneVersion, AttributeSource.AttributeFactory, TextReader)

Construct a new ArabicLetterTokenizer using a given AttributeSource.AttributeFactory.

Declaration

public ArabicLetterTokenizer(LuceneVersion matchVersion, AttributeSource.AttributeFactory factory, TextReader in)

Parameters

Type	Name	Description
LuceneVersion	matchVersion	Lucene version to match - See LuceneVersion.
AttributeSource.AttributeFactory	factory	the attribute factory to use for this Tokenizer
TextReader	in	the input to split up into tokens

| Improve this Doc View Source

ArabicLetterTokenizer(LuceneVersion, TextReader)

Construct a new ArabicLetterTokenizer.

Declaration

public ArabicLetterTokenizer(LuceneVersion matchVersion, TextReader in)

Parameters

Type	Name	Description
LuceneVersion	matchVersion	LuceneVersion to match
TextReader	in	the input to split up into tokens

Methods

| Improve this Doc View Source

IsTokenChar(Int32)

Allows for Letter category or NonspacingMark category

Declaration

protected override bool IsTokenChar(int c)

Parameters

Type	Name	Description
System.Int32	c

Returns

Type	Description
System.Boolean

Overrides

LetterTokenizer.IsTokenChar(Int32)

Implements

IDisposable