Class QueryAutoStopWordAnalyzer

An Lucene.Net.Analysis.Analyzer used primarily at query time to wrap another analyzer and provide a layer of protection which prevents very common words from being passed into queries.

For very large indexes the cost of reading TermDocs for a very common word can be high. This analyzer was created after experience with a 38 million doc index which had a term in around 50% of docs and was causing TermQueries for this term to take 2 seconds.

Inheritance

System.Object

Lucene.Net.Analysis.Analyzer

Lucene.Net.Analysis.AnalyzerWrapper

QueryAutoStopWordAnalyzer

Implements

System.IDisposable

Inherited Members

AnalyzerWrapper.WrapReader(String, TextReader)

AnalyzerWrapper.CreateComponents(String, TextReader)

AnalyzerWrapper.GetPositionIncrementGap(String)

AnalyzerWrapper.GetOffsetGap(String)

AnalyzerWrapper.InitReader(String, TextReader)

Analyzer.NewAnonymous(Func<String, TextReader, TokenStreamComponents>)

Analyzer.NewAnonymous(Func<String, TextReader, TokenStreamComponents>, ReuseStrategy)

Analyzer.NewAnonymous(Func<String, TextReader, TokenStreamComponents>, Func<String, TextReader, TextReader>)

Analyzer.NewAnonymous(Func<String, TextReader, TokenStreamComponents>, Func<String, TextReader, TextReader>, ReuseStrategy)

Analyzer.GetTokenStream(String, TextReader)

Analyzer.GetTokenStream(String, String)

Lucene.Net.Analysis.Analyzer.Strategy

Lucene.Net.Analysis.Analyzer.Dispose()

Analyzer.Dispose(Boolean)

Lucene.Net.Analysis.Analyzer.GLOBAL_REUSE_STRATEGY

Lucene.Net.Analysis.Analyzer.PER_FIELD_REUSE_STRATEGY

System.Object.Equals(System.Object)

System.Object.Equals(System.Object, System.Object)

System.Object.GetHashCode()

System.Object.GetType()

System.Object.MemberwiseClone()

System.Object.ReferenceEquals(System.Object, System.Object)

System.Object.ToString()

Namespace: Lucene.Net.Analysis.Query

Assembly: Lucene.Net.Analysis.Common.dll

Syntax

public sealed class QueryAutoStopWordAnalyzer : AnalyzerWrapper, IDisposable

Constructors

| Improve this Doc View Source

QueryAutoStopWordAnalyzer(LuceneVersion, Analyzer, IndexReader)

Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for all indexed fields from terms with a document frequency percentage greater than defaultMaxDocFreqPercent

Declaration

public QueryAutoStopWordAnalyzer(LuceneVersion matchVersion, Analyzer delegate, IndexReader indexReader)

Parameters

Type	Name	Description
Lucene.Net.Util.LuceneVersion	matchVersion	Version to be used in StopFilter
Lucene.Net.Analysis.Analyzer	delegate	Lucene.Net.Analysis.Analyzer whose Lucene.Net.Analysis.TokenStream will be filtered
Lucene.Net.Index.IndexReader	indexReader	Lucene.Net.Index.IndexReader to identify the stopwords from

Exceptions

Type	Condition
System.IO.IOException	Can be thrown while reading from the Lucene.Net.Index.IndexReader

| Improve this Doc View Source

QueryAutoStopWordAnalyzer(LuceneVersion, Analyzer, IndexReader, ICollection<String>, Int32)

Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for the given selection of fields from terms with a document frequency greater than the given maxDocFreq

Declaration

public QueryAutoStopWordAnalyzer(LuceneVersion matchVersion, Analyzer delegate, IndexReader indexReader, ICollection<string> fields, int maxDocFreq)

Parameters

Type	Name	Description
Lucene.Net.Util.LuceneVersion	matchVersion	Version to be used in StopFilter
Lucene.Net.Analysis.Analyzer	delegate	Analyzer whose TokenStream will be filtered
Lucene.Net.Index.IndexReader	indexReader	Lucene.Net.Index.IndexReader to identify the stopwords from
System.Collections.Generic.ICollection<System.String>	fields	Selection of fields to calculate stopwords for
System.Int32	maxDocFreq	Document frequency terms should be above in order to be stopwords

Exceptions

Type	Condition
System.IO.IOException	Can be thrown while reading from the Lucene.Net.Index.IndexReader

| Improve this Doc View Source

QueryAutoStopWordAnalyzer(LuceneVersion, Analyzer, IndexReader, ICollection<String>, Single)

Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for the given selection of fields from terms with a document frequency percentage greater than the given maxPercentDocs

Declaration

public QueryAutoStopWordAnalyzer(LuceneVersion matchVersion, Analyzer delegate, IndexReader indexReader, ICollection<string> fields, float maxPercentDocs)

Parameters

Type	Name	Description
Lucene.Net.Util.LuceneVersion	matchVersion	Version to be used in StopFilter
Lucene.Net.Analysis.Analyzer	delegate	Lucene.Net.Analysis.Analyzer whose Lucene.Net.Analysis.TokenStream will be filtered
Lucene.Net.Index.IndexReader	indexReader	Lucene.Net.Index.IndexReader to identify the stopwords from
System.Collections.Generic.ICollection<System.String>	fields	Selection of fields to calculate stopwords for
System.Single	maxPercentDocs	The maximum percentage (between 0.0 and 1.0) of index documents which contain a term, after which the word is considered to be a stop word

Exceptions

Type	Condition
System.IO.IOException	Can be thrown while reading from the Lucene.Net.Index.IndexReader

| Improve this Doc View Source

QueryAutoStopWordAnalyzer(LuceneVersion, Analyzer, IndexReader, Int32)

Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for all indexed fields from terms with a document frequency greater than the given maxDocFreq

Declaration

public QueryAutoStopWordAnalyzer(LuceneVersion matchVersion, Analyzer delegate, IndexReader indexReader, int maxDocFreq)

Parameters

Type	Name	Description
Lucene.Net.Util.LuceneVersion	matchVersion	Version to be used in StopFilter
Lucene.Net.Analysis.Analyzer	delegate	Lucene.Net.Analysis.Analyzer whose Lucene.Net.Analysis.TokenStream will be filtered
Lucene.Net.Index.IndexReader	indexReader	Lucene.Net.Index.IndexReader to identify the stopwords from
System.Int32	maxDocFreq	Document frequency terms should be above in order to be stopwords

Exceptions

Type	Condition
System.IO.IOException	Can be thrown while reading from the Lucene.Net.Index.IndexReader

| Improve this Doc View Source

QueryAutoStopWordAnalyzer(LuceneVersion, Analyzer, IndexReader, Single)

Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for all indexed fields from terms with a document frequency percentage greater than the given maxPercentDocs

Declaration

public QueryAutoStopWordAnalyzer(LuceneVersion matchVersion, Analyzer delegate, IndexReader indexReader, float maxPercentDocs)

Parameters

Type	Name	Description
Lucene.Net.Util.LuceneVersion	matchVersion	Version to be used in StopFilter
Lucene.Net.Analysis.Analyzer	delegate	Lucene.Net.Analysis.Analyzer whose Lucene.Net.Analysis.TokenStream will be filtered
Lucene.Net.Index.IndexReader	indexReader	Lucene.Net.Index.IndexReader to identify the stopwords from
System.Single	maxPercentDocs	The maximum percentage (between 0.0 and 1.0) of index documents which contain a term, after which the word is considered to be a stop word

Exceptions

Type	Condition
System.IO.IOException	Can be thrown while reading from the Lucene.Net.Index.IndexReader

Fields

| Improve this Doc View Source

defaultMaxDocFreqPercent

Declaration

public const float defaultMaxDocFreqPercent = 0.4F

Field Value

Type	Description
System.Single

Methods

| Improve this Doc View Source

GetStopWords()

Provides information on which stop words have been identified for all fields

Declaration

public Term[] GetStopWords()

Returns

Type	Description
Term[]	the stop words (as terms)

| Improve this Doc View Source

GetStopWords(String)

Provides information on which stop words have been identified for a field

Declaration

public string[] GetStopWords(string fieldName)

Parameters

Type	Name	Description
System.String	fieldName	The field for which stop words identified in "addStopWords" method calls will be returned

Returns

Type	Description
System.String[]	the stop words identified for a field

| Improve this Doc View Source

GetWrappedAnalyzer(String)

Declaration

protected override Analyzer GetWrappedAnalyzer(string fieldName)

Parameters

Type	Name	Description
System.String	fieldName

Returns

Type	Description
Lucene.Net.Analysis.Analyzer

Overrides

AnalyzerWrapper.GetWrappedAnalyzer(String)

| Improve this Doc View Source

WrapComponents(String, TokenStreamComponents)

Declaration

protected override TokenStreamComponents WrapComponents(string fieldName, TokenStreamComponents components)

Parameters

Type	Name	Description
System.String	fieldName
Lucene.Net.Analysis.TokenStreamComponents	components

Returns

Type	Description
Lucene.Net.Analysis.TokenStreamComponents

Overrides

AnalyzerWrapper.WrapComponents(String, TokenStreamComponents)

Implements

System.IDisposable