Namespace Lucene.Net.Analysis.Icu.Segmentation
Classes
DefaultICUTokenizerConfig
Default ICUTokenizerConfig that is generally applicable to many languages.
ICUTokenizer
Breaks text into words according to UAX #29: Unicode Text Segmentation (http://www.unicode.org/reports/tr29/)
Words are broken across script boundaries, then segmented according to the BreakIterator and typing provided by the ICUTokenizerConfig
This is a Lucene.NET EXPERIMENTAL API, use at your own risk
ICUTokenizerConfig
Class that allows for tailored Unicode Text Segmentation on a per-writing system basis.
This is a Lucene.NET EXPERIMENTAL API, use at your own risk
ICUTokenizerFactory
Factory for ICUTokenizer. Words are broken across script boundaries, then segmented according to the ICU4N.Text.BreakIterator and typing provided by the DefaultICUTokenizerConfig.