Show / Hide Table of Contents

    Namespace Lucene.Net.Analysis.Icu.Segmentation

    Tokenizer that breaks text into words with the Unicode Text Segmentation algorithm.

    Classes

    DefaultICUTokenizerConfig

    Default ICUTokenizerConfig that is generally applicable to many languages.

    ICUTokenizer

    Breaks text into words according to UAX #29: Unicode Text Segmentation (http://www.unicode.org/reports/tr29/)

    Words are broken across script boundaries, then segmented according to the BreakIterator and typing provided by the ICUTokenizerConfig

    This is a Lucene.NET EXPERIMENTAL API, use at your own risk

    ICUTokenizerConfig

    Class that allows for tailored Unicode Text Segmentation on a per-writing system basis.

    This is a Lucene.NET EXPERIMENTAL API, use at your own risk

    ICUTokenizerFactory

    Factory for ICUTokenizer. Words are broken across script boundaries, then segmented according to the and typing provided by the DefaultICUTokenizerConfig.

    • Improve this Doc
    Back to top Copyright © 2020 Licensed to the Apache Software Foundation (ASF)