Class ArabicLetterTokenizer
Tokenizer that breaks text into runs of letters and diacritics.
The problem with the standard Letter tokenizer is that it fails on diacritics. Handling similar to this is necessary for Indic Scripts, Hebrew, Thaana, etc.
You must specify the required Lucene.Net.Util.LuceneVersion compatibility when creating ArabicLetterTokenizer:
- As of 3.1, CharTokenizer uses an int based API to normalize and detect token characters. See IsTokenChar(Int32) and Normalize(Int32) for details.
Inheritance
System.Object
    Lucene.Net.Util.AttributeSource
    Lucene.Net.Analysis.TokenStream
    Lucene.Net.Analysis.Tokenizer
    
    
    ArabicLetterTokenizer
  Implements
System.IDisposable
  Inherited Members
      Lucene.Net.Analysis.Tokenizer.m_input
    
    
    
    
    
      Lucene.Net.Analysis.TokenStream.Dispose()
    
    
      Lucene.Net.Util.AttributeSource.GetAttributeFactory()
    
    
      Lucene.Net.Util.AttributeSource.GetAttributeClassesEnumerator()
    
    
      Lucene.Net.Util.AttributeSource.GetAttributeImplsEnumerator()
    
    
      Lucene.Net.Util.AttributeSource.AddAttributeImpl(Lucene.Net.Util.Attribute)
    
    
      Lucene.Net.Util.AttributeSource.AddAttribute<T>()
    
    
      Lucene.Net.Util.AttributeSource.HasAttributes
    
    
      Lucene.Net.Util.AttributeSource.HasAttribute<T>()
    
    
      Lucene.Net.Util.AttributeSource.GetAttribute<T>()
    
    
      Lucene.Net.Util.AttributeSource.ClearAttributes()
    
    
      Lucene.Net.Util.AttributeSource.CaptureState()
    
    
      Lucene.Net.Util.AttributeSource.RestoreState(Lucene.Net.Util.AttributeSource.State)
    
    
      Lucene.Net.Util.AttributeSource.GetHashCode()
    
    
    
    
      Lucene.Net.Util.AttributeSource.ReflectWith(Lucene.Net.Util.IAttributeReflector)
    
    
      Lucene.Net.Util.AttributeSource.CloneAttributes()
    
    
      Lucene.Net.Util.AttributeSource.CopyTo(Lucene.Net.Util.AttributeSource)
    
    
      Lucene.Net.Util.AttributeSource.ToString()
    
    
      System.Object.Equals(System.Object, System.Object)
    
    
      System.Object.GetType()
    
    
      System.Object.MemberwiseClone()
    
    
      System.Object.ReferenceEquals(System.Object, System.Object)
    
  Namespace: Lucene.Net.Analysis.Ar
Assembly: Lucene.Net.Analysis.Common.dll
Syntax
[Obsolete("(3.1) Use StandardTokenizer instead.")]
public class ArabicLetterTokenizer : LetterTokenizer, IDisposableConstructors
| Improve this Doc View SourceArabicLetterTokenizer(LuceneVersion, AttributeSource.AttributeFactory, TextReader)
Construct a new ArabicLetterTokenizer using a given Lucene.Net.Util.AttributeSource.AttributeFactory.
Declaration
public ArabicLetterTokenizer(LuceneVersion matchVersion, AttributeSource.AttributeFactory factory, TextReader in)Parameters
| Type | Name | Description | 
|---|---|---|
| Lucene.Net.Util.LuceneVersion | matchVersion | Lucene version to match - See Lucene.Net.Util.LuceneVersion. | 
| Lucene.Net.Util.AttributeSource.AttributeFactory | factory | the attribute factory to use for this Tokenizer | 
| System.IO.TextReader | in | the input to split up into tokens | 
ArabicLetterTokenizer(LuceneVersion, TextReader)
Construct a new ArabicLetterTokenizer.
Declaration
public ArabicLetterTokenizer(LuceneVersion matchVersion, TextReader in)Parameters
| Type | Name | Description | 
|---|---|---|
| Lucene.Net.Util.LuceneVersion | matchVersion | Lucene.Net.Util.LuceneVersion to match | 
| System.IO.TextReader | in | the input to split up into tokens | 
Methods
| Improve this Doc View SourceIsTokenChar(Int32)
Allows for Letter category or NonspacingMark category
Declaration
protected override bool IsTokenChar(int c)Parameters
| Type | Name | Description | 
|---|---|---|
| System.Int32 | c | 
Returns
| Type | Description | 
|---|---|
| System.Boolean | 
Overrides
Implements
      System.IDisposable