Class HMMChineseTokenizer

Tokenizer for Chinese or mixed Chinese-English text.

The analyzer uses probabilistic knowledge to find the optimal word segmentation for Simplified Chinese text. The text is first broken into sentences, then each sentence is segmented into words.

Inheritance

System.Object

AttributeSource

TokenStream

Tokenizer

SegmentingTokenizerBase

HMMChineseTokenizer

Implements

IDisposable

Inherited Members

SegmentingTokenizerBase.BUFFERMAX

SegmentingTokenizerBase.m_buffer

SegmentingTokenizerBase.m_offset

SegmentingTokenizerBase.IncrementToken()

SegmentingTokenizerBase.End()

SegmentingTokenizerBase.IsSafeEnd(Char)

Tokenizer.m_input

Tokenizer.Dispose(Boolean)

Tokenizer.CorrectOffset(Int32)

Tokenizer.SetReader(TextReader)

TokenStream.Dispose()

AttributeSource.GetAttributeFactory()

AttributeSource.GetAttributeClassesEnumerator()

AttributeSource.GetAttributeImplsEnumerator()

AttributeSource.AddAttributeImpl(Attribute)

AttributeSource.AddAttribute<T>()

AttributeSource.HasAttributes

AttributeSource.HasAttribute<T>()

AttributeSource.GetAttribute<T>()

AttributeSource.ClearAttributes()

AttributeSource.CaptureState()

AttributeSource.RestoreState(AttributeSource.State)

AttributeSource.GetHashCode()

AttributeSource.Equals(Object)

AttributeSource.ReflectAsString(Boolean)

AttributeSource.ReflectWith(IAttributeReflector)

AttributeSource.CloneAttributes()

AttributeSource.CopyTo(AttributeSource)

AttributeSource.ToString()

Namespace: Lucene.Net.Analysis.Cn.Smart

Assembly: Lucene.Net.Analysis.SmartCn.dll

Syntax

public class HMMChineseTokenizer : SegmentingTokenizerBase, IDisposable

Constructors

| Improve this Doc View Source

HMMChineseTokenizer(AttributeSource.AttributeFactory, TextReader)

Creates a new HMMChineseTokenizer, supplying the AttributeSource.AttributeFactory

Declaration

public HMMChineseTokenizer(AttributeSource.AttributeFactory factory, TextReader reader)

Parameters

Type	Name	Description
AttributeSource.AttributeFactory	factory
TextReader	reader

| Improve this Doc View Source

HMMChineseTokenizer(TextReader)

Creates a new HMMChineseTokenizer

Declaration

public HMMChineseTokenizer(TextReader reader)

Parameters

Type	Name	Description
TextReader	reader

Methods

| Improve this Doc View Source

IncrementWord()

Declaration

protected override bool IncrementWord()

Returns

Type	Description
System.Boolean

Overrides

SegmentingTokenizerBase.IncrementWord()

| Improve this Doc View Source

Reset()

Declaration

public override void Reset()

Overrides

SegmentingTokenizerBase.Reset()

| Improve this Doc View Source

SetNextSentence(Int32, Int32)

Declaration

protected override void SetNextSentence(int sentenceStart, int sentenceEnd)

Parameters

Type	Name	Description
System.Int32	sentenceStart
System.Int32	sentenceEnd

Overrides

SegmentingTokenizerBase.SetNextSentence(Int32, Int32)

Implements

IDisposable