Class HMMChineseTokenizer
Tokenizer for Chinese or mixed Chinese-English text.
The analyzer uses probabilistic knowledge to find the optimal word segmentation for Simplified Chinese text. The text is first broken into sentences, then each sentence is segmented into words.
Inheritance
System.Object
Lucene.Net.Util.AttributeSource
Lucene.Net.Analysis.TokenStream
Lucene.Net.Analysis.Tokenizer
Lucene.Net.Analysis.Util.SegmentingTokenizerBase
HMMChineseTokenizer
Implements
System.IDisposable
Inherited Members
Lucene.Net.Analysis.Util.SegmentingTokenizerBase.BUFFERMAX
Lucene.Net.Analysis.Util.SegmentingTokenizerBase.m_buffer
Lucene.Net.Analysis.Util.SegmentingTokenizerBase.m_offset
Lucene.Net.Analysis.Util.SegmentingTokenizerBase.IncrementToken()
Lucene.Net.Analysis.Util.SegmentingTokenizerBase.End()
Lucene.Net.Analysis.Util.SegmentingTokenizerBase.IsSafeEnd(System.Char)
Lucene.Net.Analysis.Tokenizer.m_input
Lucene.Net.Analysis.TokenStream.Dispose()
Lucene.Net.Util.AttributeSource.GetAttributeFactory()
Lucene.Net.Util.AttributeSource.GetAttributeClassesEnumerator()
Lucene.Net.Util.AttributeSource.GetAttributeImplsEnumerator()
Lucene.Net.Util.AttributeSource.AddAttributeImpl(Lucene.Net.Util.Attribute)
Lucene.Net.Util.AttributeSource.AddAttribute<T>()
Lucene.Net.Util.AttributeSource.HasAttributes
Lucene.Net.Util.AttributeSource.HasAttribute<T>()
Lucene.Net.Util.AttributeSource.GetAttribute<T>()
Lucene.Net.Util.AttributeSource.ClearAttributes()
Lucene.Net.Util.AttributeSource.CaptureState()
Lucene.Net.Util.AttributeSource.RestoreState(Lucene.Net.Util.AttributeSource.State)
Lucene.Net.Util.AttributeSource.GetHashCode()
Lucene.Net.Util.AttributeSource.ReflectWith(Lucene.Net.Util.IAttributeReflector)
Lucene.Net.Util.AttributeSource.CloneAttributes()
Lucene.Net.Util.AttributeSource.CopyTo(Lucene.Net.Util.AttributeSource)
Lucene.Net.Util.AttributeSource.ToString()
System.Object.Equals(System.Object, System.Object)
System.Object.GetType()
System.Object.MemberwiseClone()
System.Object.ReferenceEquals(System.Object, System.Object)
Namespace: Lucene.Net.Analysis.Cn.Smart
Assembly: Lucene.Net.Analysis.SmartCn.dll
Syntax
public class HMMChineseTokenizer : SegmentingTokenizerBase, IDisposable
Constructors
| Improve this Doc View SourceHMMChineseTokenizer(AttributeSource.AttributeFactory, TextReader)
Creates a new HMMChineseTokenizer, supplying the Lucene.Net.Util.AttributeSource.AttributeFactory
Declaration
public HMMChineseTokenizer(AttributeSource.AttributeFactory factory, TextReader reader)
Parameters
Type | Name | Description |
---|---|---|
Lucene.Net.Util.AttributeSource.AttributeFactory | factory | |
System.IO.TextReader | reader |
HMMChineseTokenizer(TextReader)
Creates a new HMMChineseTokenizer
Declaration
public HMMChineseTokenizer(TextReader reader)
Parameters
Type | Name | Description |
---|---|---|
System.IO.TextReader | reader |
Methods
| Improve this Doc View SourceIncrementWord()
Declaration
protected override bool IncrementWord()
Returns
Type | Description |
---|---|
System.Boolean |
Overrides
Lucene.Net.Analysis.Util.SegmentingTokenizerBase.IncrementWord()
|
Improve this Doc
View Source
Reset()
Declaration
public override void Reset()
Overrides
Lucene.Net.Analysis.Util.SegmentingTokenizerBase.Reset()
|
Improve this Doc
View Source
SetNextSentence(Int32, Int32)
Declaration
protected override void SetNextSentence(int sentenceStart, int sentenceEnd)
Parameters
Type | Name | Description |
---|---|---|
System.Int32 | sentenceStart | |
System.Int32 | sentenceEnd |
Overrides
Lucene.Net.Analysis.Util.SegmentingTokenizerBase.SetNextSentence(System.Int32, System.Int32)
Implements
System.IDisposable