Show / Hide Table of Contents

    Class HMMChineseTokenizer

    Tokenizer for Chinese or mixed Chinese-English text.

    The analyzer uses probabilistic knowledge to find the optimal word segmentation for Simplified Chinese text. The text is first broken into sentences, then each sentence is segmented into words.

    Inheritance
    System.Object
    AttributeSource
    TokenStream
    Tokenizer
    SegmentingTokenizerBase
    HMMChineseTokenizer
    Implements
    IDisposable
    Inherited Members
    SegmentingTokenizerBase.BUFFERMAX
    SegmentingTokenizerBase.m_buffer
    SegmentingTokenizerBase.m_offset
    SegmentingTokenizerBase.IncrementToken()
    SegmentingTokenizerBase.End()
    SegmentingTokenizerBase.IsSafeEnd(Char)
    Tokenizer.m_input
    Tokenizer.Dispose(Boolean)
    Tokenizer.CorrectOffset(Int32)
    Tokenizer.SetReader(TextReader)
    TokenStream.Dispose()
    AttributeSource.GetAttributeFactory()
    AttributeSource.GetAttributeClassesEnumerator()
    AttributeSource.GetAttributeImplsEnumerator()
    AttributeSource.AddAttributeImpl(Attribute)
    AttributeSource.AddAttribute<T>()
    AttributeSource.HasAttributes
    AttributeSource.HasAttribute<T>()
    AttributeSource.GetAttribute<T>()
    AttributeSource.ClearAttributes()
    AttributeSource.CaptureState()
    AttributeSource.RestoreState(AttributeSource.State)
    AttributeSource.GetHashCode()
    AttributeSource.Equals(Object)
    AttributeSource.ReflectAsString(Boolean)
    AttributeSource.ReflectWith(IAttributeReflector)
    AttributeSource.CloneAttributes()
    AttributeSource.CopyTo(AttributeSource)
    AttributeSource.ToString()
    Namespace: Lucene.Net.Analysis.Cn.Smart
    Assembly: Lucene.Net.Analysis.SmartCn.dll
    Syntax
    public class HMMChineseTokenizer : SegmentingTokenizerBase, IDisposable

    Constructors

    | Improve this Doc View Source

    HMMChineseTokenizer(AttributeSource.AttributeFactory, TextReader)

    Creates a new HMMChineseTokenizer, supplying the AttributeSource.AttributeFactory

    Declaration
    public HMMChineseTokenizer(AttributeSource.AttributeFactory factory, TextReader reader)
    Parameters
    Type Name Description
    AttributeSource.AttributeFactory factory
    TextReader reader
    | Improve this Doc View Source

    HMMChineseTokenizer(TextReader)

    Creates a new HMMChineseTokenizer

    Declaration
    public HMMChineseTokenizer(TextReader reader)
    Parameters
    Type Name Description
    TextReader reader

    Methods

    | Improve this Doc View Source

    IncrementWord()

    Declaration
    protected override bool IncrementWord()
    Returns
    Type Description
    System.Boolean
    Overrides
    SegmentingTokenizerBase.IncrementWord()
    | Improve this Doc View Source

    Reset()

    Declaration
    public override void Reset()
    Overrides
    SegmentingTokenizerBase.Reset()
    | Improve this Doc View Source

    SetNextSentence(Int32, Int32)

    Declaration
    protected override void SetNextSentence(int sentenceStart, int sentenceEnd)
    Parameters
    Type Name Description
    System.Int32 sentenceStart
    System.Int32 sentenceEnd
    Overrides
    SegmentingTokenizerBase.SetNextSentence(Int32, Int32)

    Implements

    IDisposable
    • Improve this Doc
    • View Source
    Back to top Copyright © 2020 Licensed to the Apache Software Foundation (ASF)