Class QueryAutoStopWordAnalyzer

An Lucene.Net.Analysis.Analyzer used primarily at query time to wrap another analyzer and provide a layer of protection which prevents very common words from being passed into queries.

For very large indexes the cost of reading TermDocs for a very common word can be high. This analyzer was created after experience with a 38 million doc index which had a term in around 50% of docs and was causing TermQueries for this term to take 2 seconds.

Inheritance

object

Analyzer

AnalyzerWrapper

QueryAutoStopWordAnalyzer

Implements

IDisposable

Inherited Members

AnalyzerWrapper.GetPositionIncrementGap(string)

AnalyzerWrapper.GetOffsetGap(string)

Analyzer.NewAnonymous(Func<string, TextReader, TokenStreamComponents>)

Analyzer.NewAnonymous(Func<string, TextReader, TokenStreamComponents>, ReuseStrategy)

Analyzer.NewAnonymous(Func<string, TextReader, TokenStreamComponents>, Func<string, TextReader, TextReader>)

Analyzer.NewAnonymous(Func<string, TextReader, TokenStreamComponents>, Func<string, TextReader, TextReader>, ReuseStrategy)

Analyzer.GetTokenStream(string, TextReader)

Analyzer.GetTokenStream(string, string)

Analyzer.Strategy

Analyzer.Dispose()

Analyzer.GLOBAL_REUSE_STRATEGY

Analyzer.PER_FIELD_REUSE_STRATEGY

object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.ReferenceEquals(object, object)

object.ToString()

Namespace: Lucene.Net.Analysis.Query

Assembly: Lucene.Net.Analysis.Common.dll

Syntax

public sealed class QueryAutoStopWordAnalyzer : AnalyzerWrapper, IDisposable

Constructors

QueryAutoStopWordAnalyzer(LuceneVersion, Analyzer, IndexReader)

Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for all indexed fields from terms with a document frequency percentage greater than defaultMaxDocFreqPercent

Declaration

public QueryAutoStopWordAnalyzer(LuceneVersion matchVersion, Analyzer @delegate, IndexReader indexReader)

Parameters

Type	Name	Description
LuceneVersion	matchVersion	Version to be used in StopFilter
Analyzer	delegate	Lucene.Net.Analysis.Analyzer whose Lucene.Net.Analysis.TokenStream will be filtered
IndexReader	indexReader	Lucene.Net.Index.IndexReader to identify the stopwords from

Exceptions

Type	Condition
IOException	Can be thrown while reading from the Lucene.Net.Index.IndexReader

QueryAutoStopWordAnalyzer(LuceneVersion, Analyzer, IndexReader, ICollection<string>, int)

Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for the given selection of fields from terms with a document frequency greater than the given maxDocFreq

Declaration

public QueryAutoStopWordAnalyzer(LuceneVersion matchVersion, Analyzer @delegate, IndexReader indexReader, ICollection<string> fields, int maxDocFreq)

Parameters

Type	Name	Description
LuceneVersion	matchVersion	Version to be used in StopFilter
Analyzer	delegate	Analyzer whose TokenStream will be filtered
IndexReader	indexReader	Lucene.Net.Index.IndexReader to identify the stopwords from
ICollection<string>	fields	Selection of fields to calculate stopwords for
int	maxDocFreq	Document frequency terms should be above in order to be stopwords

Exceptions

Type	Condition
IOException	Can be thrown while reading from the Lucene.Net.Index.IndexReader

QueryAutoStopWordAnalyzer(LuceneVersion, Analyzer, IndexReader, ICollection<string>, float)

Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for the given selection of fields from terms with a document frequency percentage greater than the given maxPercentDocs

Declaration

public QueryAutoStopWordAnalyzer(LuceneVersion matchVersion, Analyzer @delegate, IndexReader indexReader, ICollection<string> fields, float maxPercentDocs)

Parameters

Type	Name	Description
LuceneVersion	matchVersion	Version to be used in StopFilter
Analyzer	delegate	Lucene.Net.Analysis.Analyzer whose Lucene.Net.Analysis.TokenStream will be filtered
IndexReader	indexReader	Lucene.Net.Index.IndexReader to identify the stopwords from
ICollection<string>	fields	Selection of fields to calculate stopwords for
float	maxPercentDocs	The maximum percentage (between 0.0 and 1.0) of index documents which contain a term, after which the word is considered to be a stop word

Exceptions

Type	Condition
IOException	Can be thrown while reading from the Lucene.Net.Index.IndexReader

QueryAutoStopWordAnalyzer(LuceneVersion, Analyzer, IndexReader, int)

Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for all indexed fields from terms with a document frequency greater than the given maxDocFreq

Declaration

public QueryAutoStopWordAnalyzer(LuceneVersion matchVersion, Analyzer @delegate, IndexReader indexReader, int maxDocFreq)

Parameters

Type	Name	Description
LuceneVersion	matchVersion	Version to be used in StopFilter
Analyzer	delegate	Lucene.Net.Analysis.Analyzer whose Lucene.Net.Analysis.TokenStream will be filtered
IndexReader	indexReader	Lucene.Net.Index.IndexReader to identify the stopwords from
int	maxDocFreq	Document frequency terms should be above in order to be stopwords

Exceptions

Type	Condition
IOException	Can be thrown while reading from the Lucene.Net.Index.IndexReader

QueryAutoStopWordAnalyzer(LuceneVersion, Analyzer, IndexReader, float)

Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for all indexed fields from terms with a document frequency percentage greater than the given maxPercentDocs

Declaration

public QueryAutoStopWordAnalyzer(LuceneVersion matchVersion, Analyzer @delegate, IndexReader indexReader, float maxPercentDocs)

Parameters

Type	Name	Description
LuceneVersion	matchVersion	Version to be used in StopFilter
Analyzer	delegate	Lucene.Net.Analysis.Analyzer whose Lucene.Net.Analysis.TokenStream will be filtered
IndexReader	indexReader	Lucene.Net.Index.IndexReader to identify the stopwords from
float	maxPercentDocs	The maximum percentage (between 0.0 and 1.0) of index documents which contain a term, after which the word is considered to be a stop word

Exceptions

Type	Condition
IOException	Can be thrown while reading from the Lucene.Net.Index.IndexReader

Fields

defaultMaxDocFreqPercent

An Lucene.Net.Analysis.Analyzer used primarily at query time to wrap another analyzer and provide a layer of protection which prevents very common words from being passed into queries.

Declaration

public const float defaultMaxDocFreqPercent = 0.4

Field Value

Type	Description
float

Methods

GetStopWords()

Provides information on which stop words have been identified for all fields

Declaration

public Term[] GetStopWords()

Returns

Type	Description
Term[]	the stop words (as terms)

GetStopWords(string)

Provides information on which stop words have been identified for a field

Declaration

public string[] GetStopWords(string fieldName)

Parameters

Type	Name	Description
string	fieldName	The field for which stop words identified in "addStopWords" method calls will be returned

Returns

Type	Description
string[]	the stop words identified for a field

GetWrappedAnalyzer(string)

Retrieves the wrapped Lucene.Net.Analysis.Analyzer appropriate for analyzing the field with the given name

Declaration

protected override Analyzer GetWrappedAnalyzer(string fieldName)

Parameters

Type	Name	Description
string	fieldName	Name of the field which is to be analyzed

Returns

Type	Description
Analyzer	Lucene.Net.Analysis.Analyzer for the field with the given name. Assumed to be non-null

Overrides

AnalyzerWrapper.GetWrappedAnalyzer(string)

WrapComponents(string, TokenStreamComponents)

Wraps / alters the given Lucene.Net.Analysis.TokenStreamComponents, taken from the wrapped Lucene.Net.Analysis.Analyzer, to form new components. It is through this method that new Lucene.Net.Analysis.TokenFilters can be added by Lucene.Net.Analysis.AnalyzerWrappers. By default, the given components are returned.

Declaration

protected override TokenStreamComponents WrapComponents(string fieldName, TokenStreamComponents components)

Parameters

Type	Name	Description
string	fieldName	Name of the field which is to be analyzed
TokenStreamComponents	components	Lucene.Net.Analysis.TokenStreamComponents taken from the wrapped Lucene.Net.Analysis.Analyzer

Returns

Type	Description
TokenStreamComponents	Wrapped / altered Lucene.Net.Analysis.TokenStreamComponents.

Overrides

AnalyzerWrapper.WrapComponents(string, TokenStreamComponents)

Implements

IDisposable