Namespace Lucene.Net.Analysis.Icu.Segmentation
Tokenizer that breaks text into words with the Unicode Text Segmentation algorithm.
Classes
DefaultICUTokenizerConfig
Default ICUTokenizer
ICUTokenizer
Breaks text into words according to UAX #29: Unicode Text Segmentation (http://www.unicode.org/reports/tr29/)
Words are broken across script boundaries, then segmented according to
the BreakIterator and typing provided by the ICUTokenizer
This is a Lucene.NET EXPERIMENTAL API, use at your own risk
ICUTokenizerConfig
Class that allows for tailored Unicode Text Segmentation on a per-writing system basis.
This is a Lucene.NET EXPERIMENTAL API, use at your own risk
ICUTokenizerFactory
Factory for ICUTokenizer.
Words are broken across script boundaries, then segmented according to
the