Class ICUTokenizer
Breaks text into words according to UAX #29: Unicode Text Segmentation (http://www.unicode.org/reports/tr29/)
Words are broken across script boundaries, then segmented according to the BreakIterator and typing provided by the ICUTokenizerConfig
Implements
Inherited Members
Namespace: Lucene.Net.Analysis.Icu.Segmentation
Assembly: Lucene.Net.ICU.dll
Syntax
public sealed class ICUTokenizer : Tokenizer, IDisposable
Constructors
| Improve this Doc View SourceICUTokenizer(AttributeSource.AttributeFactory, TextReader, ICUTokenizerConfig)
Construct a new ICUTokenizer that breaks text into words from the given System.IO.TextReader, using a tailored ICU4N.Text.BreakIterator configuration.
Declaration
public ICUTokenizer(AttributeSource.AttributeFactory factory, TextReader input, ICUTokenizerConfig config)
Parameters
Type | Name | Description |
---|---|---|
AttributeSource.AttributeFactory | factory | AttributeSource.AttributeFactory to use. |
System.IO.TextReader | input | System.IO.TextReader containing text to tokenize. |
ICUTokenizerConfig | config | Tailored ICU4N.Text.BreakIterator configuration. |
ICUTokenizer(TextReader)
Construct a new ICUTokenizer that breaks text into words from the given System.IO.TextReader.
Declaration
public ICUTokenizer(TextReader input)
Parameters
Type | Name | Description |
---|---|---|
System.IO.TextReader | input | System.IO.TextReader containing text to tokenize. |
Remarks
The default script-specific handling is used.
The default attribute factory is used.
See Also
| Improve this Doc View SourceICUTokenizer(TextReader, ICUTokenizerConfig)
Construct a new ICUTokenizer that breaks text into words from the given System.IO.TextReader, using a tailored ICU4N.Text.BreakIterator configuration.
Declaration
public ICUTokenizer(TextReader input, ICUTokenizerConfig config)
Parameters
Type | Name | Description |
---|---|---|
System.IO.TextReader | input | System.IO.TextReader containing text to tokenize. |
ICUTokenizerConfig | config | Tailored ICU4N.Text.BreakIterator configuration. |
Remarks
The default attribute factory is used.
Methods
| Improve this Doc View SourceEnd()
Declaration
public override void End()
Overrides
| Improve this Doc View SourceIncrementToken()
Declaration
public override bool IncrementToken()
Returns
Type | Description |
---|---|
System.Boolean |
Overrides
| Improve this Doc View SourceReset()
Declaration
public override void Reset()