Class CommonTermsQuery
A query that executes high-frequency terms in a optional sub-query to prevent slow queries due to "common" terms like stopwords. This query builds 2 queries off the Add(Term) added terms: low-frequency terms are added to a required boolean clause and high-frequency terms are added to an optional boolean clause. The optional clause is only executed if the required "low-frequency" clause matches. Scores produced by this query will be slightly different than plain Lucene.Net.Search.BooleanQuery scorer mainly due to differences in the Coord(int, int) number of leaf queries in the required boolean clause. In most cases, high-frequency terms are unlikely to significantly contribute to the document score unless at least one of the low-frequency terms are matched. This query can improve query execution times significantly if applicable.
CommonTermsQuery has several advantages over stopword filtering at index or query time since a term can be "classified" based on the actual document frequency in the index and can prevent slow queries even across domains without specialized stopword files.
Note: if the query only contains high-frequency terms the query is rewritten into a plain conjunction query ie. all high-frequency terms need to match in order to match a document.
Collection initializer note: To create and populate a CommonTermsQuery in a single statement, you can use the following example as a guide:var query = new CommonTermsQuery() {
new Term("field", "microsoft"),
new Term("field", "office")
};
Inherited Members
Namespace: Lucene.Net.Queries
Assembly: Lucene.Net.Queries.dll
Syntax
public class CommonTermsQuery : Query, IEnumerable<Term>, IEnumerable
Constructors
CommonTermsQuery(Occur, Occur, float)
Creates a new CommonTermsQuery
Declaration
public CommonTermsQuery(Occur highFreqOccur, Occur lowFreqOccur, float maxTermFrequency)
Parameters
| Type | Name | Description |
|---|---|---|
| Occur | highFreqOccur | Lucene.Net.Search.Occur used for high frequency terms |
| Occur | lowFreqOccur | Lucene.Net.Search.Occur used for low frequency terms |
| float | maxTermFrequency | a value in [0..1) (or absolute number >=1) representing the maximum threshold of a terms document frequency to be considered a low frequency term. |
Exceptions
| Type | Condition |
|---|---|
| ArgumentException | if Lucene.Net.Search.Occur.MUST_NOT is pass as |
CommonTermsQuery(Occur, Occur, float, bool)
Creates a new CommonTermsQuery
Declaration
public CommonTermsQuery(Occur highFreqOccur, Occur lowFreqOccur, float maxTermFrequency, bool disableCoord)
Parameters
| Type | Name | Description |
|---|---|---|
| Occur | highFreqOccur | Lucene.Net.Search.Occur used for high frequency terms |
| Occur | lowFreqOccur | Lucene.Net.Search.Occur used for low frequency terms |
| float | maxTermFrequency | a value in [0..1) (or absolute number >=1) representing the maximum threshold of a terms document frequency to be considered a low frequency term. |
| bool | disableCoord | disables Coord(int, int) in scoring for the low / high frequency sub-queries |
Exceptions
| Type | Condition |
|---|---|
| ArgumentException | if Lucene.Net.Search.Occur.MUST_NOT is pass as |
Fields
m_disableCoord
A query that executes high-frequency terms in a optional sub-query to prevent slow queries due to "common" terms like stopwords. This query builds 2 queries off the Add(Term) added terms: low-frequency terms are added to a required boolean clause and high-frequency terms are added to an optional boolean clause. The optional clause is only executed if the required "low-frequency" clause matches. Scores produced by this query will be slightly different than plain Lucene.Net.Search.BooleanQuery scorer mainly due to differences in the Coord(int, int) number of leaf queries in the required boolean clause. In most cases, high-frequency terms are unlikely to significantly contribute to the document score unless at least one of the low-frequency terms are matched. This query can improve query execution times significantly if applicable.
CommonTermsQuery has several advantages over stopword filtering at index or query time since a term can be "classified" based on the actual document frequency in the index and can prevent slow queries even across domains without specialized stopword files.
Note: if the query only contains high-frequency terms the query is rewritten into a plain conjunction query ie. all high-frequency terms need to match in order to match a document.
Collection initializer note: To create and populate a CommonTermsQuery in a single statement, you can use the following example as a guide:var query = new CommonTermsQuery() {
new Term("field", "microsoft"),
new Term("field", "office")
};
Declaration
protected readonly bool m_disableCoord
Field Value
| Type | Description |
|---|---|
| bool |
m_highFreqBoost
A query that executes high-frequency terms in a optional sub-query to prevent slow queries due to "common" terms like stopwords. This query builds 2 queries off the Add(Term) added terms: low-frequency terms are added to a required boolean clause and high-frequency terms are added to an optional boolean clause. The optional clause is only executed if the required "low-frequency" clause matches. Scores produced by this query will be slightly different than plain Lucene.Net.Search.BooleanQuery scorer mainly due to differences in the Coord(int, int) number of leaf queries in the required boolean clause. In most cases, high-frequency terms are unlikely to significantly contribute to the document score unless at least one of the low-frequency terms are matched. This query can improve query execution times significantly if applicable.
CommonTermsQuery has several advantages over stopword filtering at index or query time since a term can be "classified" based on the actual document frequency in the index and can prevent slow queries even across domains without specialized stopword files.
Note: if the query only contains high-frequency terms the query is rewritten into a plain conjunction query ie. all high-frequency terms need to match in order to match a document.
Collection initializer note: To create and populate a CommonTermsQuery in a single statement, you can use the following example as a guide:var query = new CommonTermsQuery() {
new Term("field", "microsoft"),
new Term("field", "office")
};
Declaration
protected float m_highFreqBoost
Field Value
| Type | Description |
|---|---|
| float |
m_highFreqMinNrShouldMatch
A query that executes high-frequency terms in a optional sub-query to prevent slow queries due to "common" terms like stopwords. This query builds 2 queries off the Add(Term) added terms: low-frequency terms are added to a required boolean clause and high-frequency terms are added to an optional boolean clause. The optional clause is only executed if the required "low-frequency" clause matches. Scores produced by this query will be slightly different than plain Lucene.Net.Search.BooleanQuery scorer mainly due to differences in the Coord(int, int) number of leaf queries in the required boolean clause. In most cases, high-frequency terms are unlikely to significantly contribute to the document score unless at least one of the low-frequency terms are matched. This query can improve query execution times significantly if applicable.
CommonTermsQuery has several advantages over stopword filtering at index or query time since a term can be "classified" based on the actual document frequency in the index and can prevent slow queries even across domains without specialized stopword files.
Note: if the query only contains high-frequency terms the query is rewritten into a plain conjunction query ie. all high-frequency terms need to match in order to match a document.
Collection initializer note: To create and populate a CommonTermsQuery in a single statement, you can use the following example as a guide:var query = new CommonTermsQuery() {
new Term("field", "microsoft"),
new Term("field", "office")
};
Declaration
protected float m_highFreqMinNrShouldMatch
Field Value
| Type | Description |
|---|---|
| float |
m_highFreqOccur
A query that executes high-frequency terms in a optional sub-query to prevent slow queries due to "common" terms like stopwords. This query builds 2 queries off the Add(Term) added terms: low-frequency terms are added to a required boolean clause and high-frequency terms are added to an optional boolean clause. The optional clause is only executed if the required "low-frequency" clause matches. Scores produced by this query will be slightly different than plain Lucene.Net.Search.BooleanQuery scorer mainly due to differences in the Coord(int, int) number of leaf queries in the required boolean clause. In most cases, high-frequency terms are unlikely to significantly contribute to the document score unless at least one of the low-frequency terms are matched. This query can improve query execution times significantly if applicable.
CommonTermsQuery has several advantages over stopword filtering at index or query time since a term can be "classified" based on the actual document frequency in the index and can prevent slow queries even across domains without specialized stopword files.
Note: if the query only contains high-frequency terms the query is rewritten into a plain conjunction query ie. all high-frequency terms need to match in order to match a document.
Collection initializer note: To create and populate a CommonTermsQuery in a single statement, you can use the following example as a guide:var query = new CommonTermsQuery() {
new Term("field", "microsoft"),
new Term("field", "office")
};
Declaration
protected readonly Occur m_highFreqOccur
Field Value
| Type | Description |
|---|---|
| Occur |
m_lowFreqBoost
A query that executes high-frequency terms in a optional sub-query to prevent slow queries due to "common" terms like stopwords. This query builds 2 queries off the Add(Term) added terms: low-frequency terms are added to a required boolean clause and high-frequency terms are added to an optional boolean clause. The optional clause is only executed if the required "low-frequency" clause matches. Scores produced by this query will be slightly different than plain Lucene.Net.Search.BooleanQuery scorer mainly due to differences in the Coord(int, int) number of leaf queries in the required boolean clause. In most cases, high-frequency terms are unlikely to significantly contribute to the document score unless at least one of the low-frequency terms are matched. This query can improve query execution times significantly if applicable.
CommonTermsQuery has several advantages over stopword filtering at index or query time since a term can be "classified" based on the actual document frequency in the index and can prevent slow queries even across domains without specialized stopword files.
Note: if the query only contains high-frequency terms the query is rewritten into a plain conjunction query ie. all high-frequency terms need to match in order to match a document.
Collection initializer note: To create and populate a CommonTermsQuery in a single statement, you can use the following example as a guide:var query = new CommonTermsQuery() {
new Term("field", "microsoft"),
new Term("field", "office")
};
Declaration
protected float m_lowFreqBoost
Field Value
| Type | Description |
|---|---|
| float |
m_lowFreqMinNrShouldMatch
A query that executes high-frequency terms in a optional sub-query to prevent slow queries due to "common" terms like stopwords. This query builds 2 queries off the Add(Term) added terms: low-frequency terms are added to a required boolean clause and high-frequency terms are added to an optional boolean clause. The optional clause is only executed if the required "low-frequency" clause matches. Scores produced by this query will be slightly different than plain Lucene.Net.Search.BooleanQuery scorer mainly due to differences in the Coord(int, int) number of leaf queries in the required boolean clause. In most cases, high-frequency terms are unlikely to significantly contribute to the document score unless at least one of the low-frequency terms are matched. This query can improve query execution times significantly if applicable.
CommonTermsQuery has several advantages over stopword filtering at index or query time since a term can be "classified" based on the actual document frequency in the index and can prevent slow queries even across domains without specialized stopword files.
Note: if the query only contains high-frequency terms the query is rewritten into a plain conjunction query ie. all high-frequency terms need to match in order to match a document.
Collection initializer note: To create and populate a CommonTermsQuery in a single statement, you can use the following example as a guide:var query = new CommonTermsQuery() {
new Term("field", "microsoft"),
new Term("field", "office")
};
Declaration
protected float m_lowFreqMinNrShouldMatch
Field Value
| Type | Description |
|---|---|
| float |
m_lowFreqOccur
A query that executes high-frequency terms in a optional sub-query to prevent slow queries due to "common" terms like stopwords. This query builds 2 queries off the Add(Term) added terms: low-frequency terms are added to a required boolean clause and high-frequency terms are added to an optional boolean clause. The optional clause is only executed if the required "low-frequency" clause matches. Scores produced by this query will be slightly different than plain Lucene.Net.Search.BooleanQuery scorer mainly due to differences in the Coord(int, int) number of leaf queries in the required boolean clause. In most cases, high-frequency terms are unlikely to significantly contribute to the document score unless at least one of the low-frequency terms are matched. This query can improve query execution times significantly if applicable.
CommonTermsQuery has several advantages over stopword filtering at index or query time since a term can be "classified" based on the actual document frequency in the index and can prevent slow queries even across domains without specialized stopword files.
Note: if the query only contains high-frequency terms the query is rewritten into a plain conjunction query ie. all high-frequency terms need to match in order to match a document.
Collection initializer note: To create and populate a CommonTermsQuery in a single statement, you can use the following example as a guide:var query = new CommonTermsQuery() {
new Term("field", "microsoft"),
new Term("field", "office")
};
Declaration
protected readonly Occur m_lowFreqOccur
Field Value
| Type | Description |
|---|---|
| Occur |
m_maxTermFrequency
A query that executes high-frequency terms in a optional sub-query to prevent slow queries due to "common" terms like stopwords. This query builds 2 queries off the Add(Term) added terms: low-frequency terms are added to a required boolean clause and high-frequency terms are added to an optional boolean clause. The optional clause is only executed if the required "low-frequency" clause matches. Scores produced by this query will be slightly different than plain Lucene.Net.Search.BooleanQuery scorer mainly due to differences in the Coord(int, int) number of leaf queries in the required boolean clause. In most cases, high-frequency terms are unlikely to significantly contribute to the document score unless at least one of the low-frequency terms are matched. This query can improve query execution times significantly if applicable.
CommonTermsQuery has several advantages over stopword filtering at index or query time since a term can be "classified" based on the actual document frequency in the index and can prevent slow queries even across domains without specialized stopword files.
Note: if the query only contains high-frequency terms the query is rewritten into a plain conjunction query ie. all high-frequency terms need to match in order to match a document.
Collection initializer note: To create and populate a CommonTermsQuery in a single statement, you can use the following example as a guide:var query = new CommonTermsQuery() {
new Term("field", "microsoft"),
new Term("field", "office")
};
Declaration
protected readonly float m_maxTermFrequency
Field Value
| Type | Description |
|---|---|
| float |
m_terms
A query that executes high-frequency terms in a optional sub-query to prevent slow queries due to "common" terms like stopwords. This query builds 2 queries off the Add(Term) added terms: low-frequency terms are added to a required boolean clause and high-frequency terms are added to an optional boolean clause. The optional clause is only executed if the required "low-frequency" clause matches. Scores produced by this query will be slightly different than plain Lucene.Net.Search.BooleanQuery scorer mainly due to differences in the Coord(int, int) number of leaf queries in the required boolean clause. In most cases, high-frequency terms are unlikely to significantly contribute to the document score unless at least one of the low-frequency terms are matched. This query can improve query execution times significantly if applicable.
CommonTermsQuery has several advantages over stopword filtering at index or query time since a term can be "classified" based on the actual document frequency in the index and can prevent slow queries even across domains without specialized stopword files.
Note: if the query only contains high-frequency terms the query is rewritten into a plain conjunction query ie. all high-frequency terms need to match in order to match a document.
Collection initializer note: To create and populate a CommonTermsQuery in a single statement, you can use the following example as a guide:var query = new CommonTermsQuery() {
new Term("field", "microsoft"),
new Term("field", "office")
};
Declaration
protected readonly IList<Term> m_terms
Field Value
| Type | Description |
|---|---|
| IList<Term> |
Properties
HighFreqMinimumNumberShouldMatch
Gets or Sets a minimum number of the high frequent optional BooleanClauses which must be satisfied in order to produce a match on the low frequency terms query part. This method accepts a float value in the range [0..1) as a fraction of the actual query terms in the low frequent clause or a number >=1 as an absolut number of clauses that need to match.
By default no optional clauses are necessary for a match (unless there are no required clauses). If this method is used, then the specified number of clauses is required.
Declaration
public virtual float HighFreqMinimumNumberShouldMatch { get; set; }
Property Value
| Type | Description |
|---|---|
| float |
IsCoordDisabled
Returns true iff Coord(int, int) is disabled in scoring for the high and low frequency query instance. The top level query will always disable coords.
Declaration
public virtual bool IsCoordDisabled { get; }
Property Value
| Type | Description |
|---|---|
| bool |
LowFreqMinimumNumberShouldMatch
Gets or Sets a minimum number of the low frequent optional BooleanClauses which must be satisfied in order to produce a match on the low frequency terms query part. This method accepts a float value in the range [0..1) as a fraction of the actual query terms in the low frequent clause or a number >=1 as an absolut number of clauses that need to match.
By default no optional clauses are necessary for a match (unless there are no required clauses). If this method is used, then the specified number of clauses is required.
Declaration
public virtual float LowFreqMinimumNumberShouldMatch { get; set; }
Property Value
| Type | Description |
|---|---|
| float |
Methods
Add(Term)
Adds a term to the CommonTermsQuery
Declaration
public virtual void Add(Term term)
Parameters
| Type | Name | Description |
|---|---|---|
| Term | term | the term to add |
BuildQuery(int, TermContext[], Term[])
A query that executes high-frequency terms in a optional sub-query to prevent slow queries due to "common" terms like stopwords. This query builds 2 queries off the Add(Term) added terms: low-frequency terms are added to a required boolean clause and high-frequency terms are added to an optional boolean clause. The optional clause is only executed if the required "low-frequency" clause matches. Scores produced by this query will be slightly different than plain Lucene.Net.Search.BooleanQuery scorer mainly due to differences in the Coord(int, int) number of leaf queries in the required boolean clause. In most cases, high-frequency terms are unlikely to significantly contribute to the document score unless at least one of the low-frequency terms are matched. This query can improve query execution times significantly if applicable.
CommonTermsQuery has several advantages over stopword filtering at index or query time since a term can be "classified" based on the actual document frequency in the index and can prevent slow queries even across domains without specialized stopword files.
Note: if the query only contains high-frequency terms the query is rewritten into a plain conjunction query ie. all high-frequency terms need to match in order to match a document.
Collection initializer note: To create and populate a CommonTermsQuery in a single statement, you can use the following example as a guide:var query = new CommonTermsQuery() {
new Term("field", "microsoft"),
new Term("field", "office")
};
Declaration
protected virtual Query BuildQuery(int maxDoc, TermContext[] contextArray, Term[] queryTerms)
Parameters
| Type | Name | Description |
|---|---|---|
| int | maxDoc | |
| TermContext[] | contextArray | |
| Term[] | queryTerms |
Returns
| Type | Description |
|---|---|
| Query |
CalcHighFreqMinimumNumberShouldMatch(int)
A query that executes high-frequency terms in a optional sub-query to prevent slow queries due to "common" terms like stopwords. This query builds 2 queries off the Add(Term) added terms: low-frequency terms are added to a required boolean clause and high-frequency terms are added to an optional boolean clause. The optional clause is only executed if the required "low-frequency" clause matches. Scores produced by this query will be slightly different than plain Lucene.Net.Search.BooleanQuery scorer mainly due to differences in the Coord(int, int) number of leaf queries in the required boolean clause. In most cases, high-frequency terms are unlikely to significantly contribute to the document score unless at least one of the low-frequency terms are matched. This query can improve query execution times significantly if applicable.
CommonTermsQuery has several advantages over stopword filtering at index or query time since a term can be "classified" based on the actual document frequency in the index and can prevent slow queries even across domains without specialized stopword files.
Note: if the query only contains high-frequency terms the query is rewritten into a plain conjunction query ie. all high-frequency terms need to match in order to match a document.
Collection initializer note: To create and populate a CommonTermsQuery in a single statement, you can use the following example as a guide:var query = new CommonTermsQuery() {
new Term("field", "microsoft"),
new Term("field", "office")
};
Declaration
protected virtual int CalcHighFreqMinimumNumberShouldMatch(int numOptional)
Parameters
| Type | Name | Description |
|---|---|---|
| int | numOptional |
Returns
| Type | Description |
|---|---|
| int |
CalcLowFreqMinimumNumberShouldMatch(int)
A query that executes high-frequency terms in a optional sub-query to prevent slow queries due to "common" terms like stopwords. This query builds 2 queries off the Add(Term) added terms: low-frequency terms are added to a required boolean clause and high-frequency terms are added to an optional boolean clause. The optional clause is only executed if the required "low-frequency" clause matches. Scores produced by this query will be slightly different than plain Lucene.Net.Search.BooleanQuery scorer mainly due to differences in the Coord(int, int) number of leaf queries in the required boolean clause. In most cases, high-frequency terms are unlikely to significantly contribute to the document score unless at least one of the low-frequency terms are matched. This query can improve query execution times significantly if applicable.
CommonTermsQuery has several advantages over stopword filtering at index or query time since a term can be "classified" based on the actual document frequency in the index and can prevent slow queries even across domains without specialized stopword files.
Note: if the query only contains high-frequency terms the query is rewritten into a plain conjunction query ie. all high-frequency terms need to match in order to match a document.
Collection initializer note: To create and populate a CommonTermsQuery in a single statement, you can use the following example as a guide:var query = new CommonTermsQuery() {
new Term("field", "microsoft"),
new Term("field", "office")
};
Declaration
protected virtual int CalcLowFreqMinimumNumberShouldMatch(int numOptional)
Parameters
| Type | Name | Description |
|---|---|---|
| int | numOptional |
Returns
| Type | Description |
|---|---|
| int |
CollectTermContext(IndexReader, IList<AtomicReaderContext>, TermContext[], Term[])
A query that executes high-frequency terms in a optional sub-query to prevent slow queries due to "common" terms like stopwords. This query builds 2 queries off the Add(Term) added terms: low-frequency terms are added to a required boolean clause and high-frequency terms are added to an optional boolean clause. The optional clause is only executed if the required "low-frequency" clause matches. Scores produced by this query will be slightly different than plain Lucene.Net.Search.BooleanQuery scorer mainly due to differences in the Coord(int, int) number of leaf queries in the required boolean clause. In most cases, high-frequency terms are unlikely to significantly contribute to the document score unless at least one of the low-frequency terms are matched. This query can improve query execution times significantly if applicable.
CommonTermsQuery has several advantages over stopword filtering at index or query time since a term can be "classified" based on the actual document frequency in the index and can prevent slow queries even across domains without specialized stopword files.
Note: if the query only contains high-frequency terms the query is rewritten into a plain conjunction query ie. all high-frequency terms need to match in order to match a document.
Collection initializer note: To create and populate a CommonTermsQuery in a single statement, you can use the following example as a guide:var query = new CommonTermsQuery() {
new Term("field", "microsoft"),
new Term("field", "office")
};
Declaration
public virtual void CollectTermContext(IndexReader reader, IList<AtomicReaderContext> leaves, TermContext[] contextArray, Term[] queryTerms)
Parameters
| Type | Name | Description |
|---|---|---|
| IndexReader | reader | |
| IList<AtomicReaderContext> | leaves | |
| TermContext[] | contextArray | |
| Term[] | queryTerms |
Equals(object)
Determines whether the specified object is equal to the current object.
Declaration
public override bool Equals(object obj)
Parameters
| Type | Name | Description |
|---|---|---|
| object | obj | The object to compare with the current object. |
Returns
| Type | Description |
|---|---|
| bool | true if the specified object is equal to the current object; otherwise, false. |
Overrides
ExtractTerms(ISet<Term>)
Expert: adds all terms occurring in this query to the terms set. Only works if this query is in its rewritten (Lucene.Net.Search.Query.Rewrite(Lucene.Net.Index.IndexReader)) form.
Declaration
public override void ExtractTerms(ISet<Term> terms)
Parameters
| Type | Name | Description |
|---|---|---|
| ISet<Term> | terms |
Overrides
Exceptions
| Type | Condition |
|---|---|
| InvalidOperationException | If this query is not yet rewritten |
GetEnumerator()
Returns an enumerator that iterates through the m_terms collection.
Declaration
public IEnumerator<Term> GetEnumerator()
Returns
| Type | Description |
|---|---|
| IEnumerator<Term> | An enumerator that can be used to iterate through the m_terms collection. |
GetHashCode()
Serves as the default hash function.
Declaration
public override int GetHashCode()
Returns
| Type | Description |
|---|---|
| int | A hash code for the current object. |
Overrides
NewTermQuery(Term, TermContext)
Builds a new Lucene.Net.Search.TermQuery instance.
This is intended for subclasses that wish to customize the generated queries.
Declaration
protected virtual Query NewTermQuery(Term term, TermContext context)
Parameters
| Type | Name | Description |
|---|---|---|
| Term | term | term |
| TermContext | context | the Lucene.Net.Index.TermContext to be used to create the low level term query. Can be |
Returns
| Type | Description |
|---|---|
| Query | new Lucene.Net.Search.TermQuery instance |
Rewrite(IndexReader)
Expert: called to re-write queries into primitive queries. For example, a Lucene.Net.Search.PrefixQuery will be rewritten into a Lucene.Net.Search.BooleanQuery that consists of Lucene.Net.Search.TermQuerys.
Declaration
public override Query Rewrite(IndexReader reader)
Parameters
| Type | Name | Description |
|---|---|---|
| IndexReader | reader |
Returns
| Type | Description |
|---|---|
| Query |
Overrides
ToString(string)
Prints a query to a string, with field assumed to be the
default field and omitted.
Declaration
public override string ToString(string field)
Parameters
| Type | Name | Description |
|---|---|---|
| string | field |
Returns
| Type | Description |
|---|---|
| string |