Fork me on GitHub
  • API

    Show / Hide Table of Contents

    Class QueryAutoStopWordAnalyzer

    An Lucene.Net.Analysis.Analyzer used primarily at query time to wrap another analyzer and provide a layer of protection which prevents very common words from being passed into queries.

    For very large indexes the cost of reading TermDocs for a very common word can be high. This analyzer was created after experience with a 38 million doc index which had a term in around 50% of docs and was causing TermQueries for this term to take 2 seconds.

    Inheritance
    System.Object
    Lucene.Net.Analysis.Analyzer
    Lucene.Net.Analysis.AnalyzerWrapper
    QueryAutoStopWordAnalyzer
    Implements
    System.IDisposable
    Inherited Members
    AnalyzerWrapper.WrapReader(String, TextReader)
    AnalyzerWrapper.CreateComponents(String, TextReader)
    AnalyzerWrapper.GetPositionIncrementGap(String)
    AnalyzerWrapper.GetOffsetGap(String)
    AnalyzerWrapper.InitReader(String, TextReader)
    Analyzer.NewAnonymous(Func<String, TextReader, TokenStreamComponents>)
    Analyzer.NewAnonymous(Func<String, TextReader, TokenStreamComponents>, ReuseStrategy)
    Analyzer.NewAnonymous(Func<String, TextReader, TokenStreamComponents>, Func<String, TextReader, TextReader>)
    Analyzer.NewAnonymous(Func<String, TextReader, TokenStreamComponents>, Func<String, TextReader, TextReader>, ReuseStrategy)
    Analyzer.GetTokenStream(String, TextReader)
    Analyzer.GetTokenStream(String, String)
    Lucene.Net.Analysis.Analyzer.Strategy
    Lucene.Net.Analysis.Analyzer.Dispose()
    Analyzer.Dispose(Boolean)
    Lucene.Net.Analysis.Analyzer.GLOBAL_REUSE_STRATEGY
    Lucene.Net.Analysis.Analyzer.PER_FIELD_REUSE_STRATEGY
    System.Object.Equals(System.Object)
    System.Object.Equals(System.Object, System.Object)
    System.Object.GetHashCode()
    System.Object.GetType()
    System.Object.MemberwiseClone()
    System.Object.ReferenceEquals(System.Object, System.Object)
    System.Object.ToString()
    Namespace: Lucene.Net.Analysis.Query
    Assembly: Lucene.Net.Analysis.Common.dll
    Syntax
    public sealed class QueryAutoStopWordAnalyzer : AnalyzerWrapper, IDisposable

    Constructors

    | Improve this Doc View Source

    QueryAutoStopWordAnalyzer(LuceneVersion, Analyzer, IndexReader)

    Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for all indexed fields from terms with a document frequency percentage greater than defaultMaxDocFreqPercent

    Declaration
    public QueryAutoStopWordAnalyzer(LuceneVersion matchVersion, Analyzer delegate, IndexReader indexReader)
    Parameters
    Type Name Description
    Lucene.Net.Util.LuceneVersion matchVersion

    Version to be used in StopFilter

    Lucene.Net.Analysis.Analyzer delegate

    Lucene.Net.Analysis.Analyzer whose Lucene.Net.Analysis.TokenStream will be filtered

    Lucene.Net.Index.IndexReader indexReader

    Lucene.Net.Index.IndexReader to identify the stopwords from

    Exceptions
    Type Condition
    System.IO.IOException

    Can be thrown while reading from the Lucene.Net.Index.IndexReader

    | Improve this Doc View Source

    QueryAutoStopWordAnalyzer(LuceneVersion, Analyzer, IndexReader, ICollection<String>, Int32)

    Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for the given selection of fields from terms with a document frequency greater than the given maxDocFreq

    Declaration
    public QueryAutoStopWordAnalyzer(LuceneVersion matchVersion, Analyzer delegate, IndexReader indexReader, ICollection<string> fields, int maxDocFreq)
    Parameters
    Type Name Description
    Lucene.Net.Util.LuceneVersion matchVersion

    Version to be used in StopFilter

    Lucene.Net.Analysis.Analyzer delegate

    Analyzer whose TokenStream will be filtered

    Lucene.Net.Index.IndexReader indexReader

    Lucene.Net.Index.IndexReader to identify the stopwords from

    System.Collections.Generic.ICollection<System.String> fields

    Selection of fields to calculate stopwords for

    System.Int32 maxDocFreq

    Document frequency terms should be above in order to be stopwords

    Exceptions
    Type Condition
    System.IO.IOException

    Can be thrown while reading from the Lucene.Net.Index.IndexReader

    | Improve this Doc View Source

    QueryAutoStopWordAnalyzer(LuceneVersion, Analyzer, IndexReader, ICollection<String>, Single)

    Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for the given selection of fields from terms with a document frequency percentage greater than the given maxPercentDocs

    Declaration
    public QueryAutoStopWordAnalyzer(LuceneVersion matchVersion, Analyzer delegate, IndexReader indexReader, ICollection<string> fields, float maxPercentDocs)
    Parameters
    Type Name Description
    Lucene.Net.Util.LuceneVersion matchVersion

    Version to be used in StopFilter

    Lucene.Net.Analysis.Analyzer delegate

    Lucene.Net.Analysis.Analyzer whose Lucene.Net.Analysis.TokenStream will be filtered

    Lucene.Net.Index.IndexReader indexReader

    Lucene.Net.Index.IndexReader to identify the stopwords from

    System.Collections.Generic.ICollection<System.String> fields

    Selection of fields to calculate stopwords for

    System.Single maxPercentDocs

    The maximum percentage (between 0.0 and 1.0) of index documents which contain a term, after which the word is considered to be a stop word

    Exceptions
    Type Condition
    System.IO.IOException

    Can be thrown while reading from the Lucene.Net.Index.IndexReader

    | Improve this Doc View Source

    QueryAutoStopWordAnalyzer(LuceneVersion, Analyzer, IndexReader, Int32)

    Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for all indexed fields from terms with a document frequency greater than the given maxDocFreq

    Declaration
    public QueryAutoStopWordAnalyzer(LuceneVersion matchVersion, Analyzer delegate, IndexReader indexReader, int maxDocFreq)
    Parameters
    Type Name Description
    Lucene.Net.Util.LuceneVersion matchVersion

    Version to be used in StopFilter

    Lucene.Net.Analysis.Analyzer delegate

    Lucene.Net.Analysis.Analyzer whose Lucene.Net.Analysis.TokenStream will be filtered

    Lucene.Net.Index.IndexReader indexReader

    Lucene.Net.Index.IndexReader to identify the stopwords from

    System.Int32 maxDocFreq

    Document frequency terms should be above in order to be stopwords

    Exceptions
    Type Condition
    System.IO.IOException

    Can be thrown while reading from the Lucene.Net.Index.IndexReader

    | Improve this Doc View Source

    QueryAutoStopWordAnalyzer(LuceneVersion, Analyzer, IndexReader, Single)

    Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for all indexed fields from terms with a document frequency percentage greater than the given maxPercentDocs

    Declaration
    public QueryAutoStopWordAnalyzer(LuceneVersion matchVersion, Analyzer delegate, IndexReader indexReader, float maxPercentDocs)
    Parameters
    Type Name Description
    Lucene.Net.Util.LuceneVersion matchVersion

    Version to be used in StopFilter

    Lucene.Net.Analysis.Analyzer delegate

    Lucene.Net.Analysis.Analyzer whose Lucene.Net.Analysis.TokenStream will be filtered

    Lucene.Net.Index.IndexReader indexReader

    Lucene.Net.Index.IndexReader to identify the stopwords from

    System.Single maxPercentDocs

    The maximum percentage (between 0.0 and 1.0) of index documents which contain a term, after which the word is considered to be a stop word

    Exceptions
    Type Condition
    System.IO.IOException

    Can be thrown while reading from the Lucene.Net.Index.IndexReader

    Fields

    | Improve this Doc View Source

    defaultMaxDocFreqPercent

    Declaration
    public const float defaultMaxDocFreqPercent = 0.4F
    Field Value
    Type Description
    System.Single

    Methods

    | Improve this Doc View Source

    GetStopWords()

    Provides information on which stop words have been identified for all fields

    Declaration
    public Term[] GetStopWords()
    Returns
    Type Description
    Term[]

    the stop words (as terms)

    | Improve this Doc View Source

    GetStopWords(String)

    Provides information on which stop words have been identified for a field

    Declaration
    public string[] GetStopWords(string fieldName)
    Parameters
    Type Name Description
    System.String fieldName

    The field for which stop words identified in "addStopWords" method calls will be returned

    Returns
    Type Description
    System.String[]

    the stop words identified for a field

    | Improve this Doc View Source

    GetWrappedAnalyzer(String)

    Declaration
    protected override Analyzer GetWrappedAnalyzer(string fieldName)
    Parameters
    Type Name Description
    System.String fieldName
    Returns
    Type Description
    Lucene.Net.Analysis.Analyzer
    Overrides
    AnalyzerWrapper.GetWrappedAnalyzer(String)
    | Improve this Doc View Source

    WrapComponents(String, TokenStreamComponents)

    Declaration
    protected override TokenStreamComponents WrapComponents(string fieldName, TokenStreamComponents components)
    Parameters
    Type Name Description
    System.String fieldName
    Lucene.Net.Analysis.TokenStreamComponents components
    Returns
    Type Description
    Lucene.Net.Analysis.TokenStreamComponents
    Overrides
    AnalyzerWrapper.WrapComponents(String, TokenStreamComponents)

    Implements

    System.IDisposable
    • Improve this Doc
    • View Source
    Back to top Copyright © 2020 The Apache Software Foundation, Licensed under the Apache License, Version 2.0
    Apache Lucene.Net, Lucene.Net, Apache, the Apache feather logo, and the Apache Lucene.Net project logo are trademarks of The Apache Software Foundation.
    All other marks mentioned may be trademarks or registered trademarks of their respective owners.