Class ArabicLetterTokenizer
Tokenizer that breaks text into runs of letters and diacritics.
The problem with the standard Letter tokenizer is that it fails on diacritics. Handling similar to this is necessary for Indic Scripts, Hebrew, Thaana, etc.
You must specify the required Lucene.
- As of 3.1, Char
Tokenizer uses an int based API to normalize and detect token characters. See IsToken and Normalize(Int32) for details.Char(Int32)
Inheritance
System.Object
Lucene.Net.Util.AttributeSource
Lucene.Net.Analysis.TokenStream
Lucene.Net.Analysis.Tokenizer
ArabicLetterTokenizer
Implements
System.IDisposable
Inherited Members
Lucene.Net.Analysis.Tokenizer.m_input
Lucene.Net.Analysis.TokenStream.Dispose()
Lucene.Net.Util.AttributeSource.GetAttributeFactory()
Lucene.Net.Util.AttributeSource.GetAttributeClassesEnumerator()
Lucene.Net.Util.AttributeSource.GetAttributeImplsEnumerator()
Lucene.Net.Util.AttributeSource.AddAttributeImpl(Lucene.Net.Util.Attribute)
Lucene.Net.Util.AttributeSource.AddAttribute<T>()
Lucene.Net.Util.AttributeSource.HasAttributes
Lucene.Net.Util.AttributeSource.HasAttribute<T>()
Lucene.Net.Util.AttributeSource.GetAttribute<T>()
Lucene.Net.Util.AttributeSource.ClearAttributes()
Lucene.Net.Util.AttributeSource.CaptureState()
Lucene.Net.Util.AttributeSource.RestoreState(Lucene.Net.Util.AttributeSource.State)
Lucene.Net.Util.AttributeSource.GetHashCode()
Lucene.Net.Util.AttributeSource.ReflectWith(Lucene.Net.Util.IAttributeReflector)
Lucene.Net.Util.AttributeSource.CloneAttributes()
Lucene.Net.Util.AttributeSource.CopyTo(Lucene.Net.Util.AttributeSource)
Lucene.Net.Util.AttributeSource.ToString()
System.Object.Equals(System.Object, System.Object)
System.Object.GetType()
System.Object.MemberwiseClone()
System.Object.ReferenceEquals(System.Object, System.Object)
Namespace: Lucene.Net.Analysis.Ar
Assembly: Lucene.Net.Analysis.Common.dll
Syntax
[Obsolete("(3.1) Use StandardTokenizer instead.")]
public class ArabicLetterTokenizer : LetterTokenizer, IDisposable
Constructors
| Improve this Doc View SourceArabicLetterTokenizer(LuceneVersion, AttributeSource.AttributeFactory, TextReader)
Construct a new Arabic
Declaration
public ArabicLetterTokenizer(LuceneVersion matchVersion, AttributeSource.AttributeFactory factory, TextReader in)
Parameters
Type | Name | Description |
---|---|---|
Lucene. |
matchVersion | Lucene version to match - See
Lucene. |
Lucene. |
factory | the attribute factory to use for this Tokenizer |
System. |
in | the input to split up into tokens |
ArabicLetterTokenizer(LuceneVersion, TextReader)
Construct a new ArabicLetterTokenizer.
Declaration
public ArabicLetterTokenizer(LuceneVersion matchVersion, TextReader in)
Parameters
Type | Name | Description |
---|---|---|
Lucene. |
matchVersion | Lucene. |
System. |
in | the input to split up into tokens |
Methods
| Improve this Doc View SourceIsTokenChar(Int32)
Allows for Letter category or NonspacingMark category
Declaration
protected override bool IsTokenChar(int c)
Parameters
Type | Name | Description |
---|---|---|
System. |
c |
Returns
Type | Description |
---|---|
System. |
Overrides
Implements
System.IDisposable