Show / Hide Table of Contents

    Class ChineseTokenizer

    Tokenize Chinese text as individual chinese characters.

    The difference between ChineseTokenizer and CJKTokenizer is that they have different token parsing logic.

    For example, if the Chinese text "C1C2C3C4" is to be indexed:

    • The tokens returned from ChineseTokenizer are C1, C2, C3, C4.
    • The tokens returned from the CJKTokenizer are C1C2, C2C3, C3C4.

    Therefore the index created by CJKTokenizer is much larger.

    The problem is that when searching for C1, C1C2, C1C3, C4C2, C1C2C3 ... the ChineseTokenizer works, but the CJKTokenizer will not work.

    Inheritance
    System.Object
    AttributeSource
    TokenStream
    Tokenizer
    ChineseTokenizer
    Implements
    IDisposable
    Inherited Members
    Tokenizer.m_input
    Tokenizer.Dispose(Boolean)
    Tokenizer.CorrectOffset(Int32)
    Tokenizer.SetReader(TextReader)
    TokenStream.Dispose()
    AttributeSource.GetAttributeFactory()
    AttributeSource.GetAttributeClassesEnumerator()
    AttributeSource.GetAttributeImplsEnumerator()
    AttributeSource.AddAttributeImpl(Attribute)
    AttributeSource.AddAttribute<T>()
    AttributeSource.HasAttributes
    AttributeSource.HasAttribute<T>()
    AttributeSource.GetAttribute<T>()
    AttributeSource.ClearAttributes()
    AttributeSource.CaptureState()
    AttributeSource.RestoreState(AttributeSource.State)
    AttributeSource.GetHashCode()
    AttributeSource.Equals(Object)
    AttributeSource.ReflectAsString(Boolean)
    AttributeSource.ReflectWith(IAttributeReflector)
    AttributeSource.CloneAttributes()
    AttributeSource.CopyTo(AttributeSource)
    AttributeSource.ToString()
    Namespace: Lucene.Net.Analysis.Cn
    Assembly: Lucene.Net.Analysis.Common.dll
    Syntax
    public sealed class ChineseTokenizer : Tokenizer, IDisposable

    Constructors

    | Improve this Doc View Source

    ChineseTokenizer(AttributeSource.AttributeFactory, TextReader)

    Declaration
    public ChineseTokenizer(AttributeSource.AttributeFactory factory, TextReader in)
    Parameters
    Type Name Description
    AttributeSource.AttributeFactory factory
    TextReader in
    | Improve this Doc View Source

    ChineseTokenizer(TextReader)

    Declaration
    public ChineseTokenizer(TextReader in)
    Parameters
    Type Name Description
    TextReader in

    Methods

    | Improve this Doc View Source

    End()

    Declaration
    public override sealed void End()
    Overrides
    TokenStream.End()
    | Improve this Doc View Source

    IncrementToken()

    Declaration
    public override bool IncrementToken()
    Returns
    Type Description
    System.Boolean
    Overrides
    TokenStream.IncrementToken()
    | Improve this Doc View Source

    Reset()

    Declaration
    public override void Reset()
    Overrides
    Tokenizer.Reset()

    Implements

    IDisposable
    • Improve this Doc
    • View Source
    Back to top Copyright © 2020 Licensed to the Apache Software Foundation (ASF)