Namespace Lucene.Net.Analysis.Icu.Segmentation
Tokenizer that breaks text into words with the Unicode Text Segmentation algorithm.
Classes
DefaultICUTokenizerConfig
Default ICUTokenizerConfig that is generally applicable to many languages.
ICUTokenizer
Breaks text into words according to UAX #29: Unicode Text Segmentation (http://www.unicode.org/reports/tr29/)
Words are broken across script boundaries, then segmented according to the BreakIterator and typing provided by the ICUTokenizerConfig
This is a Lucene.NET EXPERIMENTAL API, use at your own risk
ICUTokenizerConfig
Class that allows for tailored Unicode Text Segmentation on a per-writing system basis.
This is a Lucene.NET EXPERIMENTAL API, use at your own risk
ICUTokenizerFactory
Factory for ICUTokenizer. Words are broken across script boundaries, then segmented according to the ICU4N.Text.BreakIterator and typing provided by the DefaultICUTokenizerConfig.