Class CommonTermsQuery
A query that executes high-frequency terms in a optional sub-query to prevent slow queries due to "common" terms like stopwords. This query builds 2 queries off the Add(Term) added terms: low-frequency terms are added to a required boolean clause and high-frequency terms are added to an optional boolean clause. The optional clause is only executed if the required "low-frequency" clause matches. Scores produced by this query will be slightly different than plain Lucene.Net.Search.BooleanQuery scorer mainly due to differences in the Coord(int, int) number of leaf queries in the required boolean clause. In most cases, high-frequency terms are unlikely to significantly contribute to the document score unless at least one of the low-frequency terms are matched. This query can improve query execution times significantly if applicable.
CommonTermsQuery has several advantages over stopword filtering at index or query time since a term can be "classified" based on the actual document frequency in the index and can prevent slow queries even across domains without specialized stopword files.
Note: if the query only contains high-frequency terms the query is rewritten into a plain conjunction query ie. all high-frequency terms need to match in order to match a document.
Collection initializer note: To create and populate a CommonTermsQuery in a single statement, you can use the following example as a guide:var query = new CommonTermsQuery() {
new Term("field", "microsoft"),
new Term("field", "office")
};
Inherited Members
Namespace: Lucene.Net.Queries
Assembly: Lucene.Net.Queries.dll
Syntax
public class CommonTermsQuery : Query, IEnumerable<Term>, IEnumerable
Constructors
CommonTermsQuery(Occur, Occur, float)
Creates a new CommonTermsQuery
Declaration
public CommonTermsQuery(Occur highFreqOccur, Occur lowFreqOccur, float maxTermFrequency)
Parameters
Type | Name | Description |
---|---|---|
Occur | highFreqOccur | Lucene.Net.Search.Occur used for high frequency terms |
Occur | lowFreqOccur | Lucene.Net.Search.Occur used for low frequency terms |
float | maxTermFrequency | a value in [0..1) (or absolute number >=1) representing the maximum threshold of a terms document frequency to be considered a low frequency term. |
Exceptions
Type | Condition |
---|---|
ArgumentException | if Lucene.Net.Search.Occur.MUST_NOT is pass as |
CommonTermsQuery(Occur, Occur, float, bool)
Creates a new CommonTermsQuery
Declaration
public CommonTermsQuery(Occur highFreqOccur, Occur lowFreqOccur, float maxTermFrequency, bool disableCoord)
Parameters
Type | Name | Description |
---|---|---|
Occur | highFreqOccur | Lucene.Net.Search.Occur used for high frequency terms |
Occur | lowFreqOccur | Lucene.Net.Search.Occur used for low frequency terms |
float | maxTermFrequency | a value in [0..1) (or absolute number >=1) representing the maximum threshold of a terms document frequency to be considered a low frequency term. |
bool | disableCoord | disables Coord(int, int) in scoring for the low / high frequency sub-queries |
Exceptions
Type | Condition |
---|---|
ArgumentException | if Lucene.Net.Search.Occur.MUST_NOT is pass as |
Fields
m_disableCoord
A query that executes high-frequency terms in a optional sub-query to prevent slow queries due to "common" terms like stopwords. This query builds 2 queries off the Add(Term) added terms: low-frequency terms are added to a required boolean clause and high-frequency terms are added to an optional boolean clause. The optional clause is only executed if the required "low-frequency" clause matches. Scores produced by this query will be slightly different than plain Lucene.Net.Search.BooleanQuery scorer mainly due to differences in the Coord(int, int) number of leaf queries in the required boolean clause. In most cases, high-frequency terms are unlikely to significantly contribute to the document score unless at least one of the low-frequency terms are matched. This query can improve query execution times significantly if applicable.
CommonTermsQuery has several advantages over stopword filtering at index or query time since a term can be "classified" based on the actual document frequency in the index and can prevent slow queries even across domains without specialized stopword files.
Note: if the query only contains high-frequency terms the query is rewritten into a plain conjunction query ie. all high-frequency terms need to match in order to match a document.
Collection initializer note: To create and populate a CommonTermsQuery in a single statement, you can use the following example as a guide:var query = new CommonTermsQuery() {
new Term("field", "microsoft"),
new Term("field", "office")
};
Declaration
protected readonly bool m_disableCoord
Field Value
Type | Description |
---|---|
bool |
m_highFreqBoost
A query that executes high-frequency terms in a optional sub-query to prevent slow queries due to "common" terms like stopwords. This query builds 2 queries off the Add(Term) added terms: low-frequency terms are added to a required boolean clause and high-frequency terms are added to an optional boolean clause. The optional clause is only executed if the required "low-frequency" clause matches. Scores produced by this query will be slightly different than plain Lucene.Net.Search.BooleanQuery scorer mainly due to differences in the Coord(int, int) number of leaf queries in the required boolean clause. In most cases, high-frequency terms are unlikely to significantly contribute to the document score unless at least one of the low-frequency terms are matched. This query can improve query execution times significantly if applicable.
CommonTermsQuery has several advantages over stopword filtering at index or query time since a term can be "classified" based on the actual document frequency in the index and can prevent slow queries even across domains without specialized stopword files.
Note: if the query only contains high-frequency terms the query is rewritten into a plain conjunction query ie. all high-frequency terms need to match in order to match a document.
Collection initializer note: To create and populate a CommonTermsQuery in a single statement, you can use the following example as a guide:var query = new CommonTermsQuery() {
new Term("field", "microsoft"),
new Term("field", "office")
};
Declaration
protected float m_highFreqBoost
Field Value
Type | Description |
---|---|
float |
m_highFreqMinNrShouldMatch
A query that executes high-frequency terms in a optional sub-query to prevent slow queries due to "common" terms like stopwords. This query builds 2 queries off the Add(Term) added terms: low-frequency terms are added to a required boolean clause and high-frequency terms are added to an optional boolean clause. The optional clause is only executed if the required "low-frequency" clause matches. Scores produced by this query will be slightly different than plain Lucene.Net.Search.BooleanQuery scorer mainly due to differences in the Coord(int, int) number of leaf queries in the required boolean clause. In most cases, high-frequency terms are unlikely to significantly contribute to the document score unless at least one of the low-frequency terms are matched. This query can improve query execution times significantly if applicable.
CommonTermsQuery has several advantages over stopword filtering at index or query time since a term can be "classified" based on the actual document frequency in the index and can prevent slow queries even across domains without specialized stopword files.
Note: if the query only contains high-frequency terms the query is rewritten into a plain conjunction query ie. all high-frequency terms need to match in order to match a document.
Collection initializer note: To create and populate a CommonTermsQuery in a single statement, you can use the following example as a guide:var query = new CommonTermsQuery() {
new Term("field", "microsoft"),
new Term("field", "office")
};
Declaration
protected float m_highFreqMinNrShouldMatch
Field Value
Type | Description |
---|---|
float |
m_highFreqOccur
A query that executes high-frequency terms in a optional sub-query to prevent slow queries due to "common" terms like stopwords. This query builds 2 queries off the Add(Term) added terms: low-frequency terms are added to a required boolean clause and high-frequency terms are added to an optional boolean clause. The optional clause is only executed if the required "low-frequency" clause matches. Scores produced by this query will be slightly different than plain Lucene.Net.Search.BooleanQuery scorer mainly due to differences in the Coord(int, int) number of leaf queries in the required boolean clause. In most cases, high-frequency terms are unlikely to significantly contribute to the document score unless at least one of the low-frequency terms are matched. This query can improve query execution times significantly if applicable.
CommonTermsQuery has several advantages over stopword filtering at index or query time since a term can be "classified" based on the actual document frequency in the index and can prevent slow queries even across domains without specialized stopword files.
Note: if the query only contains high-frequency terms the query is rewritten into a plain conjunction query ie. all high-frequency terms need to match in order to match a document.
Collection initializer note: To create and populate a CommonTermsQuery in a single statement, you can use the following example as a guide:var query = new CommonTermsQuery() {
new Term("field", "microsoft"),
new Term("field", "office")
};
Declaration
protected readonly Occur m_highFreqOccur
Field Value
Type | Description |
---|---|
Occur |
m_lowFreqBoost
A query that executes high-frequency terms in a optional sub-query to prevent slow queries due to "common" terms like stopwords. This query builds 2 queries off the Add(Term) added terms: low-frequency terms are added to a required boolean clause and high-frequency terms are added to an optional boolean clause. The optional clause is only executed if the required "low-frequency" clause matches. Scores produced by this query will be slightly different than plain Lucene.Net.Search.BooleanQuery scorer mainly due to differences in the Coord(int, int) number of leaf queries in the required boolean clause. In most cases, high-frequency terms are unlikely to significantly contribute to the document score unless at least one of the low-frequency terms are matched. This query can improve query execution times significantly if applicable.
CommonTermsQuery has several advantages over stopword filtering at index or query time since a term can be "classified" based on the actual document frequency in the index and can prevent slow queries even across domains without specialized stopword files.
Note: if the query only contains high-frequency terms the query is rewritten into a plain conjunction query ie. all high-frequency terms need to match in order to match a document.
Collection initializer note: To create and populate a CommonTermsQuery in a single statement, you can use the following example as a guide:var query = new CommonTermsQuery() {
new Term("field", "microsoft"),
new Term("field", "office")
};
Declaration
protected float m_lowFreqBoost
Field Value
Type | Description |
---|---|
float |
m_lowFreqMinNrShouldMatch
A query that executes high-frequency terms in a optional sub-query to prevent slow queries due to "common" terms like stopwords. This query builds 2 queries off the Add(Term) added terms: low-frequency terms are added to a required boolean clause and high-frequency terms are added to an optional boolean clause. The optional clause is only executed if the required "low-frequency" clause matches. Scores produced by this query will be slightly different than plain Lucene.Net.Search.BooleanQuery scorer mainly due to differences in the Coord(int, int) number of leaf queries in the required boolean clause. In most cases, high-frequency terms are unlikely to significantly contribute to the document score unless at least one of the low-frequency terms are matched. This query can improve query execution times significantly if applicable.
CommonTermsQuery has several advantages over stopword filtering at index or query time since a term can be "classified" based on the actual document frequency in the index and can prevent slow queries even across domains without specialized stopword files.
Note: if the query only contains high-frequency terms the query is rewritten into a plain conjunction query ie. all high-frequency terms need to match in order to match a document.
Collection initializer note: To create and populate a CommonTermsQuery in a single statement, you can use the following example as a guide:var query = new CommonTermsQuery() {
new Term("field", "microsoft"),
new Term("field", "office")
};
Declaration
protected float m_lowFreqMinNrShouldMatch
Field Value
Type | Description |
---|---|
float |
m_lowFreqOccur
A query that executes high-frequency terms in a optional sub-query to prevent slow queries due to "common" terms like stopwords. This query builds 2 queries off the Add(Term) added terms: low-frequency terms are added to a required boolean clause and high-frequency terms are added to an optional boolean clause. The optional clause is only executed if the required "low-frequency" clause matches. Scores produced by this query will be slightly different than plain Lucene.Net.Search.BooleanQuery scorer mainly due to differences in the Coord(int, int) number of leaf queries in the required boolean clause. In most cases, high-frequency terms are unlikely to significantly contribute to the document score unless at least one of the low-frequency terms are matched. This query can improve query execution times significantly if applicable.
CommonTermsQuery has several advantages over stopword filtering at index or query time since a term can be "classified" based on the actual document frequency in the index and can prevent slow queries even across domains without specialized stopword files.
Note: if the query only contains high-frequency terms the query is rewritten into a plain conjunction query ie. all high-frequency terms need to match in order to match a document.
Collection initializer note: To create and populate a CommonTermsQuery in a single statement, you can use the following example as a guide:var query = new CommonTermsQuery() {
new Term("field", "microsoft"),
new Term("field", "office")
};
Declaration
protected readonly Occur m_lowFreqOccur
Field Value
Type | Description |
---|---|
Occur |
m_maxTermFrequency
A query that executes high-frequency terms in a optional sub-query to prevent slow queries due to "common" terms like stopwords. This query builds 2 queries off the Add(Term) added terms: low-frequency terms are added to a required boolean clause and high-frequency terms are added to an optional boolean clause. The optional clause is only executed if the required "low-frequency" clause matches. Scores produced by this query will be slightly different than plain Lucene.Net.Search.BooleanQuery scorer mainly due to differences in the Coord(int, int) number of leaf queries in the required boolean clause. In most cases, high-frequency terms are unlikely to significantly contribute to the document score unless at least one of the low-frequency terms are matched. This query can improve query execution times significantly if applicable.
CommonTermsQuery has several advantages over stopword filtering at index or query time since a term can be "classified" based on the actual document frequency in the index and can prevent slow queries even across domains without specialized stopword files.
Note: if the query only contains high-frequency terms the query is rewritten into a plain conjunction query ie. all high-frequency terms need to match in order to match a document.
Collection initializer note: To create and populate a CommonTermsQuery in a single statement, you can use the following example as a guide:var query = new CommonTermsQuery() {
new Term("field", "microsoft"),
new Term("field", "office")
};
Declaration
protected readonly float m_maxTermFrequency
Field Value
Type | Description |
---|---|
float |
m_terms
A query that executes high-frequency terms in a optional sub-query to prevent slow queries due to "common" terms like stopwords. This query builds 2 queries off the Add(Term) added terms: low-frequency terms are added to a required boolean clause and high-frequency terms are added to an optional boolean clause. The optional clause is only executed if the required "low-frequency" clause matches. Scores produced by this query will be slightly different than plain Lucene.Net.Search.BooleanQuery scorer mainly due to differences in the Coord(int, int) number of leaf queries in the required boolean clause. In most cases, high-frequency terms are unlikely to significantly contribute to the document score unless at least one of the low-frequency terms are matched. This query can improve query execution times significantly if applicable.
CommonTermsQuery has several advantages over stopword filtering at index or query time since a term can be "classified" based on the actual document frequency in the index and can prevent slow queries even across domains without specialized stopword files.
Note: if the query only contains high-frequency terms the query is rewritten into a plain conjunction query ie. all high-frequency terms need to match in order to match a document.
Collection initializer note: To create and populate a CommonTermsQuery in a single statement, you can use the following example as a guide:var query = new CommonTermsQuery() {
new Term("field", "microsoft"),
new Term("field", "office")
};
Declaration
protected readonly IList<Term> m_terms
Field Value
Type | Description |
---|---|
IList<Term> |
Properties
HighFreqMinimumNumberShouldMatch
Gets or Sets a minimum number of the high frequent optional BooleanClauses which must be satisfied in order to produce a match on the low frequency terms query part. This method accepts a float value in the range [0..1) as a fraction of the actual query terms in the low frequent clause or a number >=1 as an absolut number of clauses that need to match.
By default no optional clauses are necessary for a match (unless there are no required clauses). If this method is used, then the specified number of clauses is required.
Declaration
public virtual float HighFreqMinimumNumberShouldMatch { get; set; }
Property Value
Type | Description |
---|---|
float |
IsCoordDisabled
Returns true iff Coord(int, int) is disabled in scoring for the high and low frequency query instance. The top level query will always disable coords.
Declaration
public virtual bool IsCoordDisabled { get; }
Property Value
Type | Description |
---|---|
bool |
LowFreqMinimumNumberShouldMatch
Gets or Sets a minimum number of the low frequent optional BooleanClauses which must be satisfied in order to produce a match on the low frequency terms query part. This method accepts a float value in the range [0..1) as a fraction of the actual query terms in the low frequent clause or a number >=1 as an absolut number of clauses that need to match.
By default no optional clauses are necessary for a match (unless there are no required clauses). If this method is used, then the specified number of clauses is required.
Declaration
public virtual float LowFreqMinimumNumberShouldMatch { get; set; }
Property Value
Type | Description |
---|---|
float |
Methods
Add(Term)
Adds a term to the CommonTermsQuery
Declaration
public virtual void Add(Term term)
Parameters
Type | Name | Description |
---|---|---|
Term | term | the term to add |
BuildQuery(int, TermContext[], Term[])
A query that executes high-frequency terms in a optional sub-query to prevent slow queries due to "common" terms like stopwords. This query builds 2 queries off the Add(Term) added terms: low-frequency terms are added to a required boolean clause and high-frequency terms are added to an optional boolean clause. The optional clause is only executed if the required "low-frequency" clause matches. Scores produced by this query will be slightly different than plain Lucene.Net.Search.BooleanQuery scorer mainly due to differences in the Coord(int, int) number of leaf queries in the required boolean clause. In most cases, high-frequency terms are unlikely to significantly contribute to the document score unless at least one of the low-frequency terms are matched. This query can improve query execution times significantly if applicable.
CommonTermsQuery has several advantages over stopword filtering at index or query time since a term can be "classified" based on the actual document frequency in the index and can prevent slow queries even across domains without specialized stopword files.
Note: if the query only contains high-frequency terms the query is rewritten into a plain conjunction query ie. all high-frequency terms need to match in order to match a document.
Collection initializer note: To create and populate a CommonTermsQuery in a single statement, you can use the following example as a guide:var query = new CommonTermsQuery() {
new Term("field", "microsoft"),
new Term("field", "office")
};
Declaration
protected virtual Query BuildQuery(int maxDoc, TermContext[] contextArray, Term[] queryTerms)
Parameters
Type | Name | Description |
---|---|---|
int | maxDoc | |
TermContext[] | contextArray | |
Term[] | queryTerms |
Returns
Type | Description |
---|---|
Query |
CalcHighFreqMinimumNumberShouldMatch(int)
A query that executes high-frequency terms in a optional sub-query to prevent slow queries due to "common" terms like stopwords. This query builds 2 queries off the Add(Term) added terms: low-frequency terms are added to a required boolean clause and high-frequency terms are added to an optional boolean clause. The optional clause is only executed if the required "low-frequency" clause matches. Scores produced by this query will be slightly different than plain Lucene.Net.Search.BooleanQuery scorer mainly due to differences in the Coord(int, int) number of leaf queries in the required boolean clause. In most cases, high-frequency terms are unlikely to significantly contribute to the document score unless at least one of the low-frequency terms are matched. This query can improve query execution times significantly if applicable.
CommonTermsQuery has several advantages over stopword filtering at index or query time since a term can be "classified" based on the actual document frequency in the index and can prevent slow queries even across domains without specialized stopword files.
Note: if the query only contains high-frequency terms the query is rewritten into a plain conjunction query ie. all high-frequency terms need to match in order to match a document.
Collection initializer note: To create and populate a CommonTermsQuery in a single statement, you can use the following example as a guide:var query = new CommonTermsQuery() {
new Term("field", "microsoft"),
new Term("field", "office")
};
Declaration
protected virtual int CalcHighFreqMinimumNumberShouldMatch(int numOptional)
Parameters
Type | Name | Description |
---|---|---|
int | numOptional |
Returns
Type | Description |
---|---|
int |
CalcLowFreqMinimumNumberShouldMatch(int)
A query that executes high-frequency terms in a optional sub-query to prevent slow queries due to "common" terms like stopwords. This query builds 2 queries off the Add(Term) added terms: low-frequency terms are added to a required boolean clause and high-frequency terms are added to an optional boolean clause. The optional clause is only executed if the required "low-frequency" clause matches. Scores produced by this query will be slightly different than plain Lucene.Net.Search.BooleanQuery scorer mainly due to differences in the Coord(int, int) number of leaf queries in the required boolean clause. In most cases, high-frequency terms are unlikely to significantly contribute to the document score unless at least one of the low-frequency terms are matched. This query can improve query execution times significantly if applicable.
CommonTermsQuery has several advantages over stopword filtering at index or query time since a term can be "classified" based on the actual document frequency in the index and can prevent slow queries even across domains without specialized stopword files.
Note: if the query only contains high-frequency terms the query is rewritten into a plain conjunction query ie. all high-frequency terms need to match in order to match a document.
Collection initializer note: To create and populate a CommonTermsQuery in a single statement, you can use the following example as a guide:var query = new CommonTermsQuery() {
new Term("field", "microsoft"),
new Term("field", "office")
};
Declaration
protected virtual int CalcLowFreqMinimumNumberShouldMatch(int numOptional)
Parameters
Type | Name | Description |
---|---|---|
int | numOptional |
Returns
Type | Description |
---|---|
int |
CollectTermContext(IndexReader, IList<AtomicReaderContext>, TermContext[], Term[])
A query that executes high-frequency terms in a optional sub-query to prevent slow queries due to "common" terms like stopwords. This query builds 2 queries off the Add(Term) added terms: low-frequency terms are added to a required boolean clause and high-frequency terms are added to an optional boolean clause. The optional clause is only executed if the required "low-frequency" clause matches. Scores produced by this query will be slightly different than plain Lucene.Net.Search.BooleanQuery scorer mainly due to differences in the Coord(int, int) number of leaf queries in the required boolean clause. In most cases, high-frequency terms are unlikely to significantly contribute to the document score unless at least one of the low-frequency terms are matched. This query can improve query execution times significantly if applicable.
CommonTermsQuery has several advantages over stopword filtering at index or query time since a term can be "classified" based on the actual document frequency in the index and can prevent slow queries even across domains without specialized stopword files.
Note: if the query only contains high-frequency terms the query is rewritten into a plain conjunction query ie. all high-frequency terms need to match in order to match a document.
Collection initializer note: To create and populate a CommonTermsQuery in a single statement, you can use the following example as a guide:var query = new CommonTermsQuery() {
new Term("field", "microsoft"),
new Term("field", "office")
};
Declaration
public virtual void CollectTermContext(IndexReader reader, IList<AtomicReaderContext> leaves, TermContext[] contextArray, Term[] queryTerms)
Parameters
Type | Name | Description |
---|---|---|
IndexReader | reader | |
IList<AtomicReaderContext> | leaves | |
TermContext[] | contextArray | |
Term[] | queryTerms |
Equals(object)
Determines whether the specified object is equal to the current object.
Declaration
public override bool Equals(object obj)
Parameters
Type | Name | Description |
---|---|---|
object | obj | The object to compare with the current object. |
Returns
Type | Description |
---|---|
bool | true if the specified object is equal to the current object; otherwise, false. |
Overrides
ExtractTerms(ISet<Term>)
Expert: adds all terms occurring in this query to the terms set. Only works if this query is in its rewritten (Lucene.Net.Search.Query.Rewrite(Lucene.Net.Index.IndexReader)) form.
Declaration
public override void ExtractTerms(ISet<Term> terms)
Parameters
Type | Name | Description |
---|---|---|
ISet<Term> | terms |
Overrides
Exceptions
Type | Condition |
---|---|
InvalidOperationException | If this query is not yet rewritten |
GetEnumerator()
Returns an enumerator that iterates through the m_terms collection.
Declaration
public IEnumerator<Term> GetEnumerator()
Returns
Type | Description |
---|---|
IEnumerator<Term> | An enumerator that can be used to iterate through the m_terms collection. |
GetHashCode()
Serves as the default hash function.
Declaration
public override int GetHashCode()
Returns
Type | Description |
---|---|
int | A hash code for the current object. |
Overrides
NewTermQuery(Term, TermContext)
Builds a new Lucene.Net.Search.TermQuery instance.
This is intended for subclasses that wish to customize the generated queries.
Declaration
protected virtual Query NewTermQuery(Term term, TermContext context)
Parameters
Type | Name | Description |
---|---|---|
Term | term | term |
TermContext | context | the Lucene.Net.Index.TermContext to be used to create the low level term query. Can be |
Returns
Type | Description |
---|---|
Query | new Lucene.Net.Search.TermQuery instance |
Rewrite(IndexReader)
Expert: called to re-write queries into primitive queries. For example, a Lucene.Net.Search.PrefixQuery will be rewritten into a Lucene.Net.Search.BooleanQuery that consists of Lucene.Net.Search.TermQuerys.
Declaration
public override Query Rewrite(IndexReader reader)
Parameters
Type | Name | Description |
---|---|---|
IndexReader | reader |
Returns
Type | Description |
---|---|
Query |
Overrides
ToString(string)
Prints a query to a string, with field
assumed to be the
default field and omitted.
Declaration
public override string ToString(string field)
Parameters
Type | Name | Description |
---|---|---|
string | field |
Returns
Type | Description |
---|---|
string |