Class ThaiTokenizer
Tokenizer that use ICU4N.Text.BreakIterator to tokenize Thai text.
Implements
Inherited Members
Namespace: Lucene.Net.Analysis.Th
Assembly: Lucene.Net.ICU.dll
Syntax
public class ThaiTokenizer : SegmentingTokenizerBase, IDisposable
Remarks
This is an attempt to mimic the behavior of the JDK's java.Text.BreakIterator approach
to tokenizing Thai text. While it passes the Lucene tests, there may be innumerable differences
between this implementation and the one in the JDK.
Constructors
ThaiTokenizer(AttributeFactory, TextReader)
Creates a new ThaiTokenizer, supplying the Lucene.Net.Util.AttributeSource.AttributeFactory
Declaration
public ThaiTokenizer(AttributeSource.AttributeFactory factory, TextReader reader)
Parameters
| Type | Name | Description |
|---|---|---|
| AttributeSource.AttributeFactory | factory | |
| TextReader | reader |
Remarks
This is an attempt to mimic the behavior of the JDK's java.Text.BreakIterator approach
to tokenizing Thai text. While it passes the Lucene tests, there may be innumerable differences
between this implementation and the one in the JDK.
ThaiTokenizer(TextReader)
Creates a new ThaiTokenizer
Declaration
public ThaiTokenizer(TextReader reader)
Parameters
| Type | Name | Description |
|---|---|---|
| TextReader | reader |
Remarks
This is an attempt to mimic the behavior of the JDK's java.Text.BreakIterator approach
to tokenizing Thai text. While it passes the Lucene tests, there may be innumerable differences
between this implementation and the one in the JDK.
Methods
CaptureState()
Captures the state of all Lucene.Net.Util.Attributes. The return value can be passed to Lucene.Net.Util.AttributeSource.RestoreState(Lucene.Net.Util.AttributeSource.State) to restore the state of this or another Lucene.Net.Util.AttributeSource.
Declaration
public override AttributeSource.State CaptureState()
Returns
| Type | Description |
|---|---|
| AttributeSource.State |
Overrides
Remarks
This is an attempt to mimic the behavior of the JDK's java.Text.BreakIterator approach
to tokenizing Thai text. While it passes the Lucene tests, there may be innumerable differences
between this implementation and the one in the JDK.
IncrementWord()
Returns true if another word is available
Declaration
protected override bool IncrementWord()
Returns
| Type | Description |
|---|---|
| bool |
Overrides
Remarks
This is an attempt to mimic the behavior of the JDK's java.Text.BreakIterator approach
to tokenizing Thai text. While it passes the Lucene tests, there may be innumerable differences
between this implementation and the one in the JDK.
Reset()
This method is called by a consumer before it begins consumption using Lucene.Net.Analysis.TokenStream.IncrementToken().
Resets this stream to a clean state. Stateful implementations must implement this method so that they can be reused, just as if they had been created fresh. If you override this method, always callbase.Reset(), otherwise
some internal state will not be correctly reset (e.g., Lucene.Net.Analysis.Tokenizer will
throw InvalidOperationException on further usage).
Declaration
public override void Reset()
Overrides
Remarks
This is an attempt to mimic the behavior of the JDK's java.Text.BreakIterator approach
to tokenizing Thai text. While it passes the Lucene tests, there may be innumerable differences
between this implementation and the one in the JDK.
SetNextSentence(int, int)
Provides the next input sentence for analysis
Declaration
protected override void SetNextSentence(int sentenceStart, int sentenceEnd)
Parameters
| Type | Name | Description |
|---|---|---|
| int | sentenceStart | |
| int | sentenceEnd |
Overrides
Remarks
This is an attempt to mimic the behavior of the JDK's java.Text.BreakIterator approach
to tokenizing Thai text. While it passes the Lucene tests, there may be innumerable differences
between this implementation and the one in the JDK.