Class ICUTokenizer
Breaks text into words according to UAX #29: Unicode Text Segmentation (http://www.unicode.org/reports/tr29/)
Words are broken across script boundaries, then segmented according to
the BreakIterator and typing provided by the ICUTokenizer
Implements
Inherited Members
Namespace: Lucene.Net.Analysis.Icu.Segmentation
Assembly: Lucene.Net.ICU.dll
Syntax
public sealed class ICUTokenizer : Tokenizer, IDisposable
Constructors
| Improve this Doc View SourceICUTokenizer(AttributeSource.AttributeFactory, TextReader, ICUTokenizerConfig)
Construct a new ICUTokenizer that breaks text into words from the given
Declaration
public ICUTokenizer(AttributeSource.AttributeFactory factory, TextReader input, ICUTokenizerConfig config)
Parameters
Type | Name | Description |
---|---|---|
Attribute |
factory | Attribute |
Text |
input | |
ICUTokenizer |
config | Tailored |
ICUTokenizer(TextReader)
Construct a new ICUTokenizer that breaks text into words from the given
Declaration
public ICUTokenizer(TextReader input)
Parameters
Type | Name | Description |
---|---|---|
Text |
input |
Remarks
The default script-specific handling is used.
The default attribute factory is used.
See Also
| Improve this Doc View SourceICUTokenizer(TextReader, ICUTokenizerConfig)
Construct a new ICUTokenizer that breaks text into words from the given
Declaration
public ICUTokenizer(TextReader input, ICUTokenizerConfig config)
Parameters
Type | Name | Description |
---|---|---|
Text |
input | |
ICUTokenizer |
config | Tailored |
Remarks
The default attribute factory is used.
Methods
| Improve this Doc View SourceEnd()
Declaration
public override void End()
Overrides
| Improve this Doc View SourceIncrementToken()
Declaration
public override bool IncrementToken()
Returns
Type | Description |
---|---|
System. |
Overrides
| Improve this Doc View SourceReset()
Declaration
public override void Reset()