Class QueryAutoStopWordAnalyzer
An Analyzer used primarily at query time to wrap another analyzer and provide a layer of protection which prevents very common words from being passed into queries.
For very large indexes the cost of reading TermDocs for a very common word can be high. This analyzer was created after experience with a 38 million doc index which had a term in around 50% of docs and was causing TermQueries for this term to take 2 seconds.
Implements
Inherited Members
Namespace: Lucene.Net.Analysis.Query
Assembly: Lucene.Net.Analysis.Common.dll
Syntax
public sealed class QueryAutoStopWordAnalyzer : AnalyzerWrapper, IDisposable
Constructors
| Improve this Doc View SourceQueryAutoStopWordAnalyzer(LuceneVersion, Analyzer, IndexReader)
Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for all indexed fields from terms with a document frequency percentage greater than defaultMaxDocFreqPercent
Declaration
public QueryAutoStopWordAnalyzer(LuceneVersion matchVersion, Analyzer delegate, IndexReader indexReader)
Parameters
Type | Name | Description |
---|---|---|
LuceneVersion | matchVersion | Version to be used in StopFilter |
Analyzer | delegate | Analyzer whose TokenStream will be filtered |
IndexReader | indexReader | IndexReader to identify the stopwords from |
Exceptions
Type | Condition |
---|---|
System.IO.IOException | Can be thrown while reading from the IndexReader |
QueryAutoStopWordAnalyzer(LuceneVersion, Analyzer, IndexReader, ICollection<String>, Int32)
Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for the
given selection of fields from terms with a document frequency greater than
the given maxDocFreq
Declaration
public QueryAutoStopWordAnalyzer(LuceneVersion matchVersion, Analyzer delegate, IndexReader indexReader, ICollection<string> fields, int maxDocFreq)
Parameters
Type | Name | Description |
---|---|---|
LuceneVersion | matchVersion | Version to be used in StopFilter |
Analyzer | delegate | Analyzer whose TokenStream will be filtered |
IndexReader | indexReader | IndexReader to identify the stopwords from |
System.Collections.Generic.ICollection<System.String> | fields | Selection of fields to calculate stopwords for |
System.Int32 | maxDocFreq | Document frequency terms should be above in order to be stopwords |
Exceptions
Type | Condition |
---|---|
System.IO.IOException | Can be thrown while reading from the IndexReader |
QueryAutoStopWordAnalyzer(LuceneVersion, Analyzer, IndexReader, ICollection<String>, Single)
Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for the
given selection of fields from terms with a document frequency percentage
greater than the given maxPercentDocs
Declaration
public QueryAutoStopWordAnalyzer(LuceneVersion matchVersion, Analyzer delegate, IndexReader indexReader, ICollection<string> fields, float maxPercentDocs)
Parameters
Type | Name | Description |
---|---|---|
LuceneVersion | matchVersion | Version to be used in StopFilter |
Analyzer | delegate | Analyzer whose TokenStream will be filtered |
IndexReader | indexReader | IndexReader to identify the stopwords from |
System.Collections.Generic.ICollection<System.String> | fields | Selection of fields to calculate stopwords for |
System.Single | maxPercentDocs | The maximum percentage (between 0.0 and 1.0) of index documents which contain a term, after which the word is considered to be a stop word |
Exceptions
Type | Condition |
---|---|
System.IO.IOException | Can be thrown while reading from the IndexReader |
QueryAutoStopWordAnalyzer(LuceneVersion, Analyzer, IndexReader, Int32)
Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for all
indexed fields from terms with a document frequency greater than the given
maxDocFreq
Declaration
public QueryAutoStopWordAnalyzer(LuceneVersion matchVersion, Analyzer delegate, IndexReader indexReader, int maxDocFreq)
Parameters
Type | Name | Description |
---|---|---|
LuceneVersion | matchVersion | Version to be used in StopFilter |
Analyzer | delegate | Analyzer whose TokenStream will be filtered |
IndexReader | indexReader | IndexReader to identify the stopwords from |
System.Int32 | maxDocFreq | Document frequency terms should be above in order to be stopwords |
Exceptions
Type | Condition |
---|---|
System.IO.IOException | Can be thrown while reading from the IndexReader |
QueryAutoStopWordAnalyzer(LuceneVersion, Analyzer, IndexReader, Single)
Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for all
indexed fields from terms with a document frequency percentage greater than
the given maxPercentDocs
Declaration
public QueryAutoStopWordAnalyzer(LuceneVersion matchVersion, Analyzer delegate, IndexReader indexReader, float maxPercentDocs)
Parameters
Type | Name | Description |
---|---|---|
LuceneVersion | matchVersion | Version to be used in StopFilter |
Analyzer | delegate | Analyzer whose TokenStream will be filtered |
IndexReader | indexReader | IndexReader to identify the stopwords from |
System.Single | maxPercentDocs | The maximum percentage (between 0.0 and 1.0) of index documents which contain a term, after which the word is considered to be a stop word |
Exceptions
Type | Condition |
---|---|
System.IO.IOException | Can be thrown while reading from the IndexReader |
Fields
| Improve this Doc View SourcedefaultMaxDocFreqPercent
Declaration
public const float defaultMaxDocFreqPercent = 0.4F
Field Value
Type | Description |
---|---|
System.Single |
Methods
| Improve this Doc View SourceGetStopWords()
Provides information on which stop words have been identified for all fields
Declaration
public Term[] GetStopWords()
Returns
Type | Description |
---|---|
Term[] | the stop words (as terms) |
GetStopWords(String)
Provides information on which stop words have been identified for a field
Declaration
public string[] GetStopWords(string fieldName)
Parameters
Type | Name | Description |
---|---|---|
System.String | fieldName | The field for which stop words identified in "addStopWords" method calls will be returned |
Returns
Type | Description |
---|---|
System.String[] | the stop words identified for a field |
GetWrappedAnalyzer(String)
Declaration
protected override Analyzer GetWrappedAnalyzer(string fieldName)
Parameters
Type | Name | Description |
---|---|---|
System.String | fieldName |
Returns
Type | Description |
---|---|
Analyzer |
Overrides
| Improve this Doc View SourceWrapComponents(String, TokenStreamComponents)
Declaration
protected override TokenStreamComponents WrapComponents(string fieldName, TokenStreamComponents components)
Parameters
Type | Name | Description |
---|---|---|
System.String | fieldName | |
TokenStreamComponents | components |
Returns
Type | Description |
---|---|
TokenStreamComponents |