Class ICUTokenizerConfig
Class that allows for tailored Unicode Text Segmentation on a per-writing system basis.
This is a Lucene.NET EXPERIMENTAL API, use at your own risk
Inherited Members
System.Object.Equals(System.Object)
System.Object.Equals(System.Object, System.Object)
System.Object.GetHashCode()
System.Object.GetType()
System.Object.MemberwiseClone()
System.Object.ReferenceEquals(System.Object, System.Object)
System.Object.ToString()
Namespace: Lucene.Net.Analysis.Icu.Segmentation
Assembly: Lucene.Net.ICU.dll
Syntax
public abstract class ICUTokenizerConfig
Constructors
| Improve this Doc View SourceICUTokenizerConfig()
Sole constructor. (For invocation by subclass constructors, typically implicit.)
Declaration
public ICUTokenizerConfig()
Properties
| Improve this Doc View SourceCombineCJ
true if Han, Hiragana, and Katakana scripts should all be returned as Japanese
Declaration
public abstract bool CombineCJ { get; }
Property Value
Type | Description |
---|---|
System.Boolean |
Methods
| Improve this Doc View SourceGetBreakIterator(Int32)
Return a breakiterator capable of processing a given script.
Declaration
public abstract BreakIterator GetBreakIterator(int script)
Parameters
Type | Name | Description |
---|---|---|
System.Int32 | script |
Returns
Type | Description |
---|---|
ICU4N.Text.BreakIterator |
GetType(Int32, Int32)
Return a token type value for a given script and BreakIterator rule status.
Declaration
public abstract string GetType(int script, int ruleStatus)
Parameters
Type | Name | Description |
---|---|---|
System.Int32 | script | |
System.Int32 | ruleStatus |
Returns
Type | Description |
---|---|
System.String |