Fork me on GitHub
  • API

    Show / Hide Table of Contents

    Class StandardAnalyzer

    Filters StandardTokenizer with StandardFilter, LowerCaseFilter and StopFilter, using a list of English stop words.

    You must specify the required Lucene.Net.Util.LuceneVersion compatibility when creating StandardAnalyzer:

    • As of 3.4, Hiragana and Han characters are no longer wrongly split from their combining characters. If you use a previous version number, you get the exact broken behavior for backwards compatibility.
    • As of 3.1, StandardTokenizer implements Unicode text segmentation, and StopFilter correctly handles Unicode 4.0 supplementary characters in stopwords. ClassicTokenizer and ClassicAnalyzer are the pre-3.1 implementations of StandardTokenizer and StandardAnalyzer.
    • As of 2.9, StopFilter preserves position increments
    • As of 2.4, Lucene.Net.Analysis.Tokens incorrectly identified as acronyms are corrected (see LUCENE-1068)

    Inheritance
    System.Object
    Lucene.Net.Analysis.Analyzer
    StopwordAnalyzerBase
    StandardAnalyzer
    Implements
    System.IDisposable
    Inherited Members
    StopwordAnalyzerBase.m_stopwords
    StopwordAnalyzerBase.m_matchVersion
    StopwordAnalyzerBase.StopwordSet
    StopwordAnalyzerBase.LoadStopwordSet(Boolean, Type, String, String)
    StopwordAnalyzerBase.LoadStopwordSet(FileInfo, LuceneVersion)
    StopwordAnalyzerBase.LoadStopwordSet(TextReader, LuceneVersion)
    Analyzer.NewAnonymous(Func<String, TextReader, TokenStreamComponents>)
    Analyzer.NewAnonymous(Func<String, TextReader, TokenStreamComponents>, ReuseStrategy)
    Analyzer.NewAnonymous(Func<String, TextReader, TokenStreamComponents>, Func<String, TextReader, TextReader>)
    Analyzer.NewAnonymous(Func<String, TextReader, TokenStreamComponents>, Func<String, TextReader, TextReader>, ReuseStrategy)
    Analyzer.GetTokenStream(String, TextReader)
    Analyzer.GetTokenStream(String, String)
    Analyzer.InitReader(String, TextReader)
    Analyzer.GetPositionIncrementGap(String)
    Analyzer.GetOffsetGap(String)
    Lucene.Net.Analysis.Analyzer.Strategy
    Lucene.Net.Analysis.Analyzer.Dispose()
    Analyzer.Dispose(Boolean)
    Lucene.Net.Analysis.Analyzer.GLOBAL_REUSE_STRATEGY
    Lucene.Net.Analysis.Analyzer.PER_FIELD_REUSE_STRATEGY
    System.Object.Equals(System.Object)
    System.Object.Equals(System.Object, System.Object)
    System.Object.GetHashCode()
    System.Object.GetType()
    System.Object.MemberwiseClone()
    System.Object.ReferenceEquals(System.Object, System.Object)
    System.Object.ToString()
    Namespace: Lucene.Net.Analysis.Standard
    Assembly: Lucene.Net.Analysis.Common.dll
    Syntax
    public sealed class StandardAnalyzer : StopwordAnalyzerBase, IDisposable

    Constructors

    | Improve this Doc View Source

    StandardAnalyzer(LuceneVersion)

    Builds an analyzer with the default stop words (STOP_WORDS_SET).

    Declaration
    public StandardAnalyzer(LuceneVersion matchVersion)
    Parameters
    Type Name Description
    Lucene.Net.Util.LuceneVersion matchVersion

    Lucene compatibility version - See StandardAnalyzer

    | Improve this Doc View Source

    StandardAnalyzer(LuceneVersion, CharArraySet)

    Builds an analyzer with the given stop words.

    Declaration
    public StandardAnalyzer(LuceneVersion matchVersion, CharArraySet stopWords)
    Parameters
    Type Name Description
    Lucene.Net.Util.LuceneVersion matchVersion

    Lucene compatibility version - See StandardAnalyzer

    CharArraySet stopWords

    stop words

    | Improve this Doc View Source

    StandardAnalyzer(LuceneVersion, TextReader)

    Builds an analyzer with the stop words from the given reader.

    Declaration
    public StandardAnalyzer(LuceneVersion matchVersion, TextReader stopwords)
    Parameters
    Type Name Description
    Lucene.Net.Util.LuceneVersion matchVersion

    Lucene compatibility version - See StandardAnalyzer

    System.IO.TextReader stopwords

    System.IO.TextReader to read stop words from

    See Also
    GetWordSet(TextReader, LuceneVersion)

    Fields

    | Improve this Doc View Source

    DEFAULT_MAX_TOKEN_LENGTH

    Default maximum allowed token length

    Declaration
    public const int DEFAULT_MAX_TOKEN_LENGTH = 255
    Field Value
    Type Description
    System.Int32
    | Improve this Doc View Source

    STOP_WORDS_SET

    An unmodifiable set containing some common English words that are usually not useful for searching.

    Declaration
    public static readonly CharArraySet STOP_WORDS_SET
    Field Value
    Type Description
    CharArraySet

    Properties

    | Improve this Doc View Source

    MaxTokenLength

    Set maximum allowed token length. If a token is seen that exceeds this length then it is discarded. This setting only takes effect the next time tokenStream or tokenStream is called.

    Declaration
    public int MaxTokenLength { get; set; }
    Property Value
    Type Description
    System.Int32

    Methods

    | Improve this Doc View Source

    CreateComponents(String, TextReader)

    Declaration
    protected override TokenStreamComponents CreateComponents(string fieldName, TextReader reader)
    Parameters
    Type Name Description
    System.String fieldName
    System.IO.TextReader reader
    Returns
    Type Description
    Lucene.Net.Analysis.TokenStreamComponents
    Overrides
    Analyzer.CreateComponents(String, TextReader)

    Implements

    System.IDisposable
    • Improve this Doc
    • View Source
    Back to top Copyright © 2022 The Apache Software Foundation, Licensed under the Apache License, Version 2.0
    Apache Lucene.Net, Lucene.Net, Apache, the Apache feather logo, and the Apache Lucene.Net project logo are trademarks of The Apache Software Foundation.
    All other marks mentioned may be trademarks or registered trademarks of their respective owners.