Namespace Lucene.Net.Analysis.Util
Classes
CharArrayIterator
A CharacterIterator used internally for use with ICU4N.Text.BreakIterator
Note
This API is for internal purposes only and might change in incompatible ways in the next release.
SegmentingTokenizerBase
Breaks text into sentences with a ICU4N.Text.BreakIterator and allows subclasses to decompose these sentences into words.
This can be used by subclasses that need sentence context for tokenization purposes, such as CJK segmenters.
Additionally it can be used by subclasses that want to mark sentence boundaries (with a custom attribute, extra token, position increment, etc) for downstream processing.
Note
This API is experimental and might change in incompatible ways in the next release.