Class ICUTokenizerConfig

Class that allows for tailored Unicode Text Segmentation on a per-writing system basis.

Note

This API is experimental and might change in incompatible ways in the next release.

Inheritance

object

ICUTokenizerConfig

DefaultICUTokenizerConfig

Inherited Members

object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Namespace: Lucene.Net.Analysis.Icu.Segmentation

Assembly: Lucene.Net.ICU.dll

Syntax

public abstract class ICUTokenizerConfig

Constructors

ICUTokenizerConfig()

Sole constructor. (For invocation by subclass constructors, typically implicit.)

Declaration

protected ICUTokenizerConfig()

Fields

EMOJI_SEQUENCE_STATUS

Class that allows for tailored Unicode Text Segmentation on a per-writing system basis.

Note

This API is experimental and might change in incompatible ways in the next release.

Declaration

public const int EMOJI_SEQUENCE_STATUS = 299

Field Value

Type	Description
int

Properties

CombineCJ

true if Han, Hiragana, and Katakana scripts should all be returned as Japanese

Declaration

public abstract bool CombineCJ { get; }

Property Value

Type	Description
bool

Methods

GetBreakIterator(int)

Return a breakiterator capable of processing a given script.

Declaration

public abstract RuleBasedBreakIterator GetBreakIterator(int script)

Parameters

Type	Name	Description
int	script

Returns

Type	Description
RuleBasedBreakIterator

GetType(int, int)

Return a token type value for a given script and BreakIterator rule status.

Declaration

public abstract string GetType(int script, int ruleStatus)

Parameters

Type	Name	Description
int	script
int	ruleStatus

Returns

Type	Description
string