Class ICUTokenizer
Breaks text into words according to UAX #29: Unicode Text Segmentation (http://www.unicode.org/reports/tr29/)
Words are broken across script boundaries, then segmented according to the BreakIterator and typing provided by the ICUTokenizerConfig
Implements
Inherited Members
Namespace: Lucene.Net.Analysis.Icu.Segmentation
Assembly: Lucene.Net.ICU.dll
Syntax
public sealed class ICUTokenizer : Tokenizer, IDisposable
  Constructors
| Improve this Doc View SourceICUTokenizer(AttributeSource.AttributeFactory, TextReader, ICUTokenizerConfig)
Construct a new ICUTokenizer that breaks text into words from the given System.IO.TextReader, using a tailored ICU4N.Text.BreakIterator configuration.
Declaration
public ICUTokenizer(AttributeSource.AttributeFactory factory, TextReader input, ICUTokenizerConfig config)
  Parameters
| Type | Name | Description | 
|---|---|---|
| AttributeSource.AttributeFactory | factory | AttributeSource.AttributeFactory to use.  | 
      
| System.IO.TextReader | input | System.IO.TextReader containing text to tokenize.  | 
      
| ICUTokenizerConfig | config | Tailored ICU4N.Text.BreakIterator configuration.  | 
      
ICUTokenizer(TextReader)
Construct a new ICUTokenizer that breaks text into words from the given System.IO.TextReader.
Declaration
public ICUTokenizer(TextReader input)
  Parameters
| Type | Name | Description | 
|---|---|---|
| System.IO.TextReader | input | System.IO.TextReader containing text to tokenize.  | 
      
Remarks
The default script-specific handling is used.
The default attribute factory is used.
See Also
| Improve this Doc View SourceICUTokenizer(TextReader, ICUTokenizerConfig)
Construct a new ICUTokenizer that breaks text into words from the given System.IO.TextReader, using a tailored ICU4N.Text.BreakIterator configuration.
Declaration
public ICUTokenizer(TextReader input, ICUTokenizerConfig config)
  Parameters
| Type | Name | Description | 
|---|---|---|
| System.IO.TextReader | input | System.IO.TextReader containing text to tokenize.  | 
      
| ICUTokenizerConfig | config | Tailored ICU4N.Text.BreakIterator configuration.  | 
      
Remarks
The default attribute factory is used.
Methods
| Improve this Doc View SourceEnd()
Declaration
public override void End()
  Overrides
| Improve this Doc View SourceIncrementToken()
Declaration
public override bool IncrementToken()
  Returns
| Type | Description | 
|---|---|
| System.Boolean | 
Overrides
| Improve this Doc View SourceReset()
Declaration
public override void Reset()