Fork me on GitHub
  • API

    Show / Hide Table of Contents

    Class QueryAutoStopWordAnalyzer

    An Lucene.Net.Analysis.Analyzer used primarily at query time to wrap another analyzer and provide a layer of protection which prevents very common words from being passed into queries.

    For very large indexes the cost of reading TermDocs for a very common word can be high. This analyzer was created after experience with a 38 million doc index which had a term in around 50% of docs and was causing TermQueries for this term to take 2 seconds.

    Inheritance
    object
    Analyzer
    AnalyzerWrapper
    QueryAutoStopWordAnalyzer
    Implements
    IDisposable
    Inherited Members
    AnalyzerWrapper.GetPositionIncrementGap(string)
    AnalyzerWrapper.GetOffsetGap(string)
    Analyzer.NewAnonymous(Func<string, TextReader, TokenStreamComponents>)
    Analyzer.NewAnonymous(Func<string, TextReader, TokenStreamComponents>, ReuseStrategy)
    Analyzer.NewAnonymous(Func<string, TextReader, TokenStreamComponents>, Func<string, TextReader, TextReader>)
    Analyzer.NewAnonymous(Func<string, TextReader, TokenStreamComponents>, Func<string, TextReader, TextReader>, ReuseStrategy)
    Analyzer.GetTokenStream(string, TextReader)
    Analyzer.GetTokenStream(string, string)
    Analyzer.Strategy
    Analyzer.Dispose()
    Analyzer.GLOBAL_REUSE_STRATEGY
    Analyzer.PER_FIELD_REUSE_STRATEGY
    object.Equals(object)
    object.Equals(object, object)
    object.GetHashCode()
    object.GetType()
    object.ReferenceEquals(object, object)
    object.ToString()
    Namespace: Lucene.Net.Analysis.Query
    Assembly: Lucene.Net.Analysis.Common.dll
    Syntax
    public sealed class QueryAutoStopWordAnalyzer : AnalyzerWrapper, IDisposable

    Constructors

    QueryAutoStopWordAnalyzer(LuceneVersion, Analyzer, IndexReader)

    Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for all indexed fields from terms with a document frequency percentage greater than defaultMaxDocFreqPercent

    Declaration
    public QueryAutoStopWordAnalyzer(LuceneVersion matchVersion, Analyzer @delegate, IndexReader indexReader)
    Parameters
    Type Name Description
    LuceneVersion matchVersion

    Version to be used in StopFilter

    Analyzer delegate

    Lucene.Net.Analysis.Analyzer whose Lucene.Net.Analysis.TokenStream will be filtered

    IndexReader indexReader

    Lucene.Net.Index.IndexReader to identify the stopwords from

    Exceptions
    Type Condition
    IOException

    Can be thrown while reading from the Lucene.Net.Index.IndexReader

    QueryAutoStopWordAnalyzer(LuceneVersion, Analyzer, IndexReader, ICollection<string>, int)

    Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for the given selection of fields from terms with a document frequency greater than the given maxDocFreq

    Declaration
    public QueryAutoStopWordAnalyzer(LuceneVersion matchVersion, Analyzer @delegate, IndexReader indexReader, ICollection<string> fields, int maxDocFreq)
    Parameters
    Type Name Description
    LuceneVersion matchVersion

    Version to be used in StopFilter

    Analyzer delegate

    Analyzer whose TokenStream will be filtered

    IndexReader indexReader

    Lucene.Net.Index.IndexReader to identify the stopwords from

    ICollection<string> fields

    Selection of fields to calculate stopwords for

    int maxDocFreq

    Document frequency terms should be above in order to be stopwords

    Exceptions
    Type Condition
    IOException

    Can be thrown while reading from the Lucene.Net.Index.IndexReader

    QueryAutoStopWordAnalyzer(LuceneVersion, Analyzer, IndexReader, ICollection<string>, float)

    Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for the given selection of fields from terms with a document frequency percentage greater than the given maxPercentDocs

    Declaration
    public QueryAutoStopWordAnalyzer(LuceneVersion matchVersion, Analyzer @delegate, IndexReader indexReader, ICollection<string> fields, float maxPercentDocs)
    Parameters
    Type Name Description
    LuceneVersion matchVersion

    Version to be used in StopFilter

    Analyzer delegate

    Lucene.Net.Analysis.Analyzer whose Lucene.Net.Analysis.TokenStream will be filtered

    IndexReader indexReader

    Lucene.Net.Index.IndexReader to identify the stopwords from

    ICollection<string> fields

    Selection of fields to calculate stopwords for

    float maxPercentDocs

    The maximum percentage (between 0.0 and 1.0) of index documents which contain a term, after which the word is considered to be a stop word

    Exceptions
    Type Condition
    IOException

    Can be thrown while reading from the Lucene.Net.Index.IndexReader

    QueryAutoStopWordAnalyzer(LuceneVersion, Analyzer, IndexReader, int)

    Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for all indexed fields from terms with a document frequency greater than the given maxDocFreq

    Declaration
    public QueryAutoStopWordAnalyzer(LuceneVersion matchVersion, Analyzer @delegate, IndexReader indexReader, int maxDocFreq)
    Parameters
    Type Name Description
    LuceneVersion matchVersion

    Version to be used in StopFilter

    Analyzer delegate

    Lucene.Net.Analysis.Analyzer whose Lucene.Net.Analysis.TokenStream will be filtered

    IndexReader indexReader

    Lucene.Net.Index.IndexReader to identify the stopwords from

    int maxDocFreq

    Document frequency terms should be above in order to be stopwords

    Exceptions
    Type Condition
    IOException

    Can be thrown while reading from the Lucene.Net.Index.IndexReader

    QueryAutoStopWordAnalyzer(LuceneVersion, Analyzer, IndexReader, float)

    Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for all indexed fields from terms with a document frequency percentage greater than the given maxPercentDocs

    Declaration
    public QueryAutoStopWordAnalyzer(LuceneVersion matchVersion, Analyzer @delegate, IndexReader indexReader, float maxPercentDocs)
    Parameters
    Type Name Description
    LuceneVersion matchVersion

    Version to be used in StopFilter

    Analyzer delegate

    Lucene.Net.Analysis.Analyzer whose Lucene.Net.Analysis.TokenStream will be filtered

    IndexReader indexReader

    Lucene.Net.Index.IndexReader to identify the stopwords from

    float maxPercentDocs

    The maximum percentage (between 0.0 and 1.0) of index documents which contain a term, after which the word is considered to be a stop word

    Exceptions
    Type Condition
    IOException

    Can be thrown while reading from the Lucene.Net.Index.IndexReader

    Fields

    defaultMaxDocFreqPercent

    An Lucene.Net.Analysis.Analyzer used primarily at query time to wrap another analyzer and provide a layer of protection which prevents very common words from being passed into queries.

    For very large indexes the cost of reading TermDocs for a very common word can be high. This analyzer was created after experience with a 38 million doc index which had a term in around 50% of docs and was causing TermQueries for this term to take 2 seconds.

    Declaration
    public const float defaultMaxDocFreqPercent = 0.4
    Field Value
    Type Description
    float

    Methods

    GetStopWords()

    Provides information on which stop words have been identified for all fields

    Declaration
    public Term[] GetStopWords()
    Returns
    Type Description
    Term[]

    the stop words (as terms)

    GetStopWords(string)

    Provides information on which stop words have been identified for a field

    Declaration
    public string[] GetStopWords(string fieldName)
    Parameters
    Type Name Description
    string fieldName

    The field for which stop words identified in "addStopWords" method calls will be returned

    Returns
    Type Description
    string[]

    the stop words identified for a field

    GetWrappedAnalyzer(string)

    Retrieves the wrapped Lucene.Net.Analysis.Analyzer appropriate for analyzing the field with the given name

    Declaration
    protected override Analyzer GetWrappedAnalyzer(string fieldName)
    Parameters
    Type Name Description
    string fieldName

    Name of the field which is to be analyzed

    Returns
    Type Description
    Analyzer

    Lucene.Net.Analysis.Analyzer for the field with the given name. Assumed to be non-null

    Overrides
    AnalyzerWrapper.GetWrappedAnalyzer(string)

    WrapComponents(string, TokenStreamComponents)

    Wraps / alters the given Lucene.Net.Analysis.TokenStreamComponents, taken from the wrapped Lucene.Net.Analysis.Analyzer, to form new components. It is through this method that new Lucene.Net.Analysis.TokenFilters can be added by Lucene.Net.Analysis.AnalyzerWrappers. By default, the given components are returned.

    Declaration
    protected override TokenStreamComponents WrapComponents(string fieldName, TokenStreamComponents components)
    Parameters
    Type Name Description
    string fieldName

    Name of the field which is to be analyzed

    TokenStreamComponents components

    Lucene.Net.Analysis.TokenStreamComponents taken from the wrapped Lucene.Net.Analysis.Analyzer

    Returns
    Type Description
    TokenStreamComponents

    Wrapped / altered Lucene.Net.Analysis.TokenStreamComponents.

    Overrides
    AnalyzerWrapper.WrapComponents(string, TokenStreamComponents)

    Implements

    IDisposable
    Back to top Copyright © 2024 The Apache Software Foundation, Licensed under the Apache License, Version 2.0
    Apache Lucene.Net, Lucene.Net, Apache, the Apache feather logo, and the Apache Lucene.Net project logo are trademarks of The Apache Software Foundation.
    All other marks mentioned may be trademarks or registered trademarks of their respective owners.