Class ICUTokenizerConfig
Class that allows for tailored Unicode Text Segmentation on a per-writing system basis.
This is a Lucene.NET EXPERIMENTAL API, use at your own risk
Namespace: Lucene.Net.Analysis.Icu.Segmentation
Assembly: Lucene.Net.ICU.dll
Syntax
public abstract class ICUTokenizerConfig : object
Constructors
| Improve this Doc View SourceICUTokenizerConfig()
Sole constructor. (For invocation by subclass constructors, typically implicit.)
Declaration
public ICUTokenizerConfig()
Properties
| Improve this Doc View SourceCombineCJ
true if Han, Hiragana, and Katakana scripts should all be returned as Japanese
Declaration
public abstract bool CombineCJ { get; }
Property Value
Type | Description |
---|---|
System. |
Methods
| Improve this Doc View SourceGetBreakIterator(Int32)
Return a breakiterator capable of processing a given script.
Declaration
public abstract BreakIterator GetBreakIterator(int script)
Parameters
Type | Name | Description |
---|---|---|
System. |
script |
Returns
Type | Description |
---|---|
Break |
GetType(Int32, Int32)
Return a token type value for a given script and BreakIterator rule status.
Declaration
public abstract string GetType(int script, int ruleStatus)
Parameters
Type | Name | Description |
---|---|---|
System. |
script | |
System. |
ruleStatus |
Returns
Type | Description |
---|---|
System. |