Class ICUTokenizerConfig
Class that allows for tailored Unicode Text Segmentation on a per-writing system basis.
Note
This API is experimental and might change in incompatible ways in the next release.
Inherited Members
Namespace: Lucene.Net.Analysis.Icu.Segmentation
Assembly: Lucene.Net.ICU.dll
Syntax
public abstract class ICUTokenizerConfig
Constructors
ICUTokenizerConfig()
Sole constructor. (For invocation by subclass constructors, typically implicit.)
Declaration
protected ICUTokenizerConfig()
Fields
EMOJI_SEQUENCE_STATUS
Class that allows for tailored Unicode Text Segmentation on a per-writing system basis.
Note
This API is experimental and might change in incompatible ways in the next release.
Declaration
public const int EMOJI_SEQUENCE_STATUS = 299
Field Value
Type | Description |
---|---|
int |
Properties
CombineCJ
true if Han, Hiragana, and Katakana scripts should all be returned as Japanese
Declaration
public abstract bool CombineCJ { get; }
Property Value
Type | Description |
---|---|
bool |
Methods
GetBreakIterator(int)
Return a breakiterator capable of processing a given script.
Declaration
public abstract RuleBasedBreakIterator GetBreakIterator(int script)
Parameters
Type | Name | Description |
---|---|---|
int | script |
Returns
Type | Description |
---|---|
RuleBasedBreakIterator |
GetType(int, int)
Return a token type value for a given script and BreakIterator rule status.
Declaration
public abstract string GetType(int script, int ruleStatus)
Parameters
Type | Name | Description |
---|---|---|
int | script | |
int | ruleStatus |
Returns
Type | Description |
---|---|
string |