Show / Hide Table of Contents

    Class ICUTokenizerConfig

    Class that allows for tailored Unicode Text Segmentation on a per-writing system basis.

    This is a Lucene.NET EXPERIMENTAL API, use at your own risk
    Inheritance
    System.Object
    ICUTokenizerConfig
    DefaultICUTokenizerConfig
    Namespace: Lucene.Net.Analysis.Icu.Segmentation
    Assembly: Lucene.Net.ICU.dll
    Syntax
    public abstract class ICUTokenizerConfig : object

    Constructors

    | Improve this Doc View Source

    ICUTokenizerConfig()

    Sole constructor. (For invocation by subclass constructors, typically implicit.)

    Declaration
    public ICUTokenizerConfig()

    Properties

    | Improve this Doc View Source

    CombineCJ

    true if Han, Hiragana, and Katakana scripts should all be returned as Japanese

    Declaration
    public abstract bool CombineCJ { get; }
    Property Value
    Type Description
    System.Boolean

    Methods

    | Improve this Doc View Source

    GetBreakIterator(Int32)

    Return a breakiterator capable of processing a given script.

    Declaration
    public abstract BreakIterator GetBreakIterator(int script)
    Parameters
    Type Name Description
    System.Int32 script
    Returns
    Type Description
    BreakIterator
    | Improve this Doc View Source

    GetType(Int32, Int32)

    Return a token type value for a given script and BreakIterator rule status.

    Declaration
    public abstract string GetType(int script, int ruleStatus)
    Parameters
    Type Name Description
    System.Int32 script
    System.Int32 ruleStatus
    Returns
    Type Description
    System.String
    • Improve this Doc
    • View Source
    Back to top Copyright © 2020 Licensed to the Apache Software Foundation (ASF)