Show / Hide Table of Contents

    Class StandardAnalyzer

    Filters StandardTokenizer with StandardFilter, LowerCaseFilter and StopFilter, using a list of English stop words.

    You must specify the required LuceneVersion compatibility when creating StandardAnalyzer:

    • As of 3.4, Hiragana and Han characters are no longer wrongly split from their combining characters. If you use a previous version number, you get the exact broken behavior for backwards compatibility.
    • As of 3.1, StandardTokenizer implements Unicode text segmentation, and StopFilter correctly handles Unicode 4.0 supplementary characters in stopwords. ClassicTokenizer and ClassicAnalyzer are the pre-3.1 implementations of StandardTokenizer and StandardAnalyzer.
    • As of 2.9, StopFilter preserves position increments
    • As of 2.4, Tokens incorrectly identified as acronyms are corrected (see LUCENE-1068)

    Inheritance
    System.Object
    Analyzer
    StopwordAnalyzerBase
    StandardAnalyzer
    Inherited Members
    StopwordAnalyzerBase.m_stopwords
    StopwordAnalyzerBase.m_matchVersion
    StopwordAnalyzerBase.StopwordSet
    StopwordAnalyzerBase.LoadStopwordSet(Boolean, Type, String, String)
    StopwordAnalyzerBase.LoadStopwordSet(FileInfo, LuceneVersion)
    StopwordAnalyzerBase.LoadStopwordSet(TextReader, LuceneVersion)
    Lucene.Net.Analysis.Analyzer.NewAnonymous(Func<, , >)
    Lucene.Net.Analysis.Analyzer.NewAnonymous(Func<, , >, Lucene.Net.Analysis.ReuseStrategy)
    Lucene.Net.Analysis.Analyzer.NewAnonymous(Func<, , >, Func<, , >)
    Lucene.Net.Analysis.Analyzer.NewAnonymous(Func<, , >, Func<, , >, Lucene.Net.Analysis.ReuseStrategy)
    Analyzer.GetTokenStream(String, TextReader)
    Analyzer.GetTokenStream(String, String)
    Analyzer.InitReader(String, TextReader)
    Analyzer.GetPositionIncrementGap(String)
    Analyzer.GetOffsetGap(String)
    Analyzer.Strategy
    Analyzer.Dispose()
    Analyzer.Dispose(Boolean)
    Analyzer.GLOBAL_REUSE_STRATEGY
    Analyzer.PER_FIELD_REUSE_STRATEGY
    Namespace: Lucene.Net.Analysis.Standard
    Assembly: Lucene.Net.Analysis.Common.dll
    Syntax
    public sealed class StandardAnalyzer : StopwordAnalyzerBase

    Constructors

    | Improve this Doc View Source

    StandardAnalyzer(LuceneVersion)

    Builds an analyzer with the default stop words (STOP_WORDS_SET).

    Declaration
    public StandardAnalyzer(LuceneVersion matchVersion)
    Parameters
    Type Name Description
    LuceneVersion matchVersion

    Lucene compatibility version - See StandardAnalyzer

    | Improve this Doc View Source

    StandardAnalyzer(LuceneVersion, CharArraySet)

    Builds an analyzer with the given stop words.

    Declaration
    public StandardAnalyzer(LuceneVersion matchVersion, CharArraySet stopWords)
    Parameters
    Type Name Description
    LuceneVersion matchVersion

    Lucene compatibility version - See StandardAnalyzer

    CharArraySet stopWords

    stop words

    | Improve this Doc View Source

    StandardAnalyzer(LuceneVersion, TextReader)

    Builds an analyzer with the stop words from the given reader.

    Declaration
    public StandardAnalyzer(LuceneVersion matchVersion, TextReader stopwords)
    Parameters
    Type Name Description
    LuceneVersion matchVersion

    Lucene compatibility version - See StandardAnalyzer

    TextReader stopwords

    to read stop words from

    See Also
    GetWordSet(TextReader, LuceneVersion)

    Fields

    | Improve this Doc View Source

    DEFAULT_MAX_TOKEN_LENGTH

    Default maximum allowed token length

    Declaration
    public const int DEFAULT_MAX_TOKEN_LENGTH = null
    Field Value
    Type Description
    System.Int32
    | Improve this Doc View Source

    STOP_WORDS_SET

    An unmodifiable set containing some common English words that are usually not useful for searching.

    Declaration
    public static readonly CharArraySet STOP_WORDS_SET
    Field Value
    Type Description
    CharArraySet

    Properties

    | Improve this Doc View Source

    MaxTokenLength

    Set maximum allowed token length. If a token is seen that exceeds this length then it is discarded. This setting only takes effect the next time tokenStream or tokenStream is called.

    Declaration
    public int MaxTokenLength { get; set; }
    Property Value
    Type Description
    System.Int32

    Methods

    | Improve this Doc View Source

    CreateComponents(String, TextReader)

    Declaration
    protected override TokenStreamComponents CreateComponents(string fieldName, TextReader reader)
    Parameters
    Type Name Description
    System.String fieldName
    TextReader reader
    Returns
    Type Description
    TokenStreamComponents
    • Improve this Doc
    • View Source
    Back to top Copyright © 2020 Licensed to the Apache Software Foundation (ASF)