Class HMMChineseTokenizer
Tokenizer for Chinese or mixed Chinese-English text.
The analyzer uses probabilistic knowledge to find the optimal word segmentation for Simplified Chinese text. The text is first broken into sentences, then each sentence is segmented into words.
Inheritance
System.Object
HMMChineseTokenizer
Implements
System.IDisposable
Inherited Members
System.Object.Equals(System.Object, System.Object)
System.Object.GetType()
System.Object.MemberwiseClone()
System.Object.ReferenceEquals(System.Object, System.Object)
Namespace: Lucene.Net.Analysis.Cn.Smart
Assembly: Lucene.Net.Analysis.SmartCn.dll
Syntax
public class HMMChineseTokenizer : SegmentingTokenizerBase, IDisposable
Constructors
| Improve this Doc View SourceHMMChineseTokenizer(AttributeSource.AttributeFactory, TextReader)
Creates a new HMMChineseTokenizer, supplying the AttributeSource.AttributeFactory
Declaration
public HMMChineseTokenizer(AttributeSource.AttributeFactory factory, TextReader reader)
Parameters
Type | Name | Description |
---|---|---|
AttributeSource.AttributeFactory | factory | |
System.IO.TextReader | reader |
HMMChineseTokenizer(TextReader)
Creates a new HMMChineseTokenizer
Declaration
public HMMChineseTokenizer(TextReader reader)
Parameters
Type | Name | Description |
---|---|---|
System.IO.TextReader | reader |
Methods
| Improve this Doc View SourceIncrementWord()
Declaration
protected override bool IncrementWord()
Returns
Type | Description |
---|---|
System.Boolean |
Overrides
| Improve this Doc View SourceReset()
Declaration
public override void Reset()
Overrides
| Improve this Doc View SourceSetNextSentence(Int32, Int32)
Declaration
protected override void SetNextSentence(int sentenceStart, int sentenceEnd)
Parameters
Type | Name | Description |
---|---|---|
System.Int32 | sentenceStart | |
System.Int32 | sentenceEnd |
Overrides
Implements
System.IDisposable