Fork me on GitHub
  • API

    Show / Hide Table of Contents

    Class SmartChineseAnalyzer

    SmartChineseAnalyzer is an analyzer for Chinese or mixed Chinese-English text. The analyzer uses probabilistic knowledge to find the optimal word segmentation for Simplified Chinese text. The text is first broken into sentences, then each sentence is segmented into words.

    Segmentation is based upon the Hidden Markov Model. A large training corpus was used to calculate Chinese word frequency probability.

    This analyzer requires a dictionary to provide statistical data. SmartChineseAnalyzer has an included dictionary out-of-box.

    The included dictionary data is from ICTCLAS1.0. Thanks to ICTCLAS for their hard work, and for contributing the data under the Apache 2 License!

    Note

    This API is experimental and might change in incompatible ways in the next release.

    Inheritance
    System.Object
    Lucene.Net.Analysis.Analyzer
    SmartChineseAnalyzer
    Implements
    System.IDisposable
    Inherited Members
    Analyzer.NewAnonymous(Func<String, TextReader, TokenStreamComponents>)
    Analyzer.NewAnonymous(Func<String, TextReader, TokenStreamComponents>, ReuseStrategy)
    Analyzer.NewAnonymous(Func<String, TextReader, TokenStreamComponents>, Func<String, TextReader, TextReader>)
    Analyzer.NewAnonymous(Func<String, TextReader, TokenStreamComponents>, Func<String, TextReader, TextReader>, ReuseStrategy)
    Analyzer.GetTokenStream(String, TextReader)
    Analyzer.GetTokenStream(String, String)
    Analyzer.InitReader(String, TextReader)
    Analyzer.GetPositionIncrementGap(String)
    Analyzer.GetOffsetGap(String)
    Lucene.Net.Analysis.Analyzer.Strategy
    Lucene.Net.Analysis.Analyzer.Dispose()
    Analyzer.Dispose(Boolean)
    Lucene.Net.Analysis.Analyzer.GLOBAL_REUSE_STRATEGY
    Lucene.Net.Analysis.Analyzer.PER_FIELD_REUSE_STRATEGY
    System.Object.Equals(System.Object)
    System.Object.Equals(System.Object, System.Object)
    System.Object.GetHashCode()
    System.Object.GetType()
    System.Object.MemberwiseClone()
    System.Object.ReferenceEquals(System.Object, System.Object)
    System.Object.ToString()
    Namespace: Lucene.Net.Analysis.Cn.Smart
    Assembly: Lucene.Net.Analysis.SmartCn.dll
    Syntax
    public sealed class SmartChineseAnalyzer : Analyzer, IDisposable

    Constructors

    | Improve this Doc View Source

    SmartChineseAnalyzer(LuceneVersion)

    Create a new SmartChineseAnalyzer, using the default stopword list.

    Declaration
    public SmartChineseAnalyzer(LuceneVersion matchVersion)
    Parameters
    Type Name Description
    Lucene.Net.Util.LuceneVersion matchVersion
    | Improve this Doc View Source

    SmartChineseAnalyzer(LuceneVersion, CharArraySet)

    Create a new SmartChineseAnalyzer, using the provided Lucene.Net.Analysis.Util.CharArraySet of stopwords.

    Note: the set should include punctuation, unless you want to index punctuation!

    Declaration
    public SmartChineseAnalyzer(LuceneVersion matchVersion, CharArraySet stopWords)
    Parameters
    Type Name Description
    Lucene.Net.Util.LuceneVersion matchVersion
    Lucene.Net.Analysis.Util.CharArraySet stopWords

    Lucene.Net.Analysis.Util.CharArraySet of stopwords to use.

    | Improve this Doc View Source

    SmartChineseAnalyzer(LuceneVersion, Boolean)

    Create a new SmartChineseAnalyzer, optionally using the default stopword list.

    The included default stopword list is simply a list of punctuation. If you do not use this list, punctuation will not be removed from the text!

    Declaration
    public SmartChineseAnalyzer(LuceneVersion matchVersion, bool useDefaultStopWords)
    Parameters
    Type Name Description
    Lucene.Net.Util.LuceneVersion matchVersion
    System.Boolean useDefaultStopWords

    true to use the default stopword list.

    Methods

    | Improve this Doc View Source

    CreateComponents(String, TextReader)

    Declaration
    protected override TokenStreamComponents CreateComponents(string fieldName, TextReader reader)
    Parameters
    Type Name Description
    System.String fieldName
    System.IO.TextReader reader
    Returns
    Type Description
    Lucene.Net.Analysis.TokenStreamComponents
    Overrides
    Analyzer.CreateComponents(String, TextReader)
    | Improve this Doc View Source

    GetDefaultStopSet()

    Returns an unmodifiable instance of the default stop-words set.

    Declaration
    public static CharArraySet GetDefaultStopSet()
    Returns
    Type Description
    Lucene.Net.Analysis.Util.CharArraySet

    An unmodifiable instance of the default stop-words set.

    Implements

    System.IDisposable
    • Improve this Doc
    • View Source
    Back to top Copyright © 2022 The Apache Software Foundation, Licensed under the Apache License, Version 2.0
    Apache Lucene.Net, Lucene.Net, Apache, the Apache feather logo, and the Apache Lucene.Net project logo are trademarks of The Apache Software Foundation.
    All other marks mentioned may be trademarks or registered trademarks of their respective owners.