Class CommonTermsQuery
A query that executes high-frequency terms in a optional sub-query to prevent
slow queries due to "common" terms like stopwords. This query
builds 2 queries off the Add(Term) added terms: low-frequency
terms are added to a required boolean clause and high-frequency terms are
added to an optional boolean clause. The optional clause is only executed if
the required "low-frequency" clause matches. Scores produced by this query
will be slightly different than plain BooleanQuery scorer mainly due to
differences in the Coord(Int32, Int32) number of leaf queries
in the required boolean clause. In most cases, high-frequency terms are
unlikely to significantly contribute to the document score unless at least
one of the low-frequency terms are matched. This query can improve
query execution times significantly if applicable.
CommonTermsQuery has several advantages over stopword filtering at
index or query time since a term can be "classified" based on the actual
document frequency in the index and can prevent slow queries even across
domains without specialized stopword files.
Note: if the query only contains high-frequency terms the query is
rewritten into a plain conjunction query ie. all high-frequency terms need to
match in order to match a document.
Collection initializer note: To create and populate a CommonTermsQuery
in a single statement, you can use the following example as a guide:
var query = new CommonTermsQuery() {
new Term("field", "microsoft"),
new Term("field", "office")
};
Inheritance
System.Object
CommonTermsQuery
Implements
System.Collections.Generic.IEnumerable<
Term>
System.Collections.IEnumerable
Inherited Members
System.Object.Equals(System.Object, System.Object)
System.Object.GetType()
System.Object.MemberwiseClone()
System.Object.ReferenceEquals(System.Object, System.Object)
Assembly: Lucene.Net.Queries.dll
Syntax
public class CommonTermsQuery : Query, IEnumerable<Term>, IEnumerable
Constructors
|
Improve this Doc
View Source
CommonTermsQuery(Occur, Occur, Single)
Declaration
public CommonTermsQuery(Occur highFreqOccur, Occur lowFreqOccur, float maxTermFrequency)
Parameters
Type |
Name |
Description |
Occur |
highFreqOccur |
Occur used for high frequency terms
|
Occur |
lowFreqOccur |
Occur used for low frequency terms
|
System.Single |
maxTermFrequency |
a value in [0..1) (or absolute number >=1) representing the
maximum threshold of a terms document frequency to be considered a
low frequency term.
|
Exceptions
Type |
Condition |
System.ArgumentException |
if MUST_NOT is pass as lowFreqOccur or
highFreqOccur
|
|
Improve this Doc
View Source
CommonTermsQuery(Occur, Occur, Single, Boolean)
Declaration
public CommonTermsQuery(Occur highFreqOccur, Occur lowFreqOccur, float maxTermFrequency, bool disableCoord)
Parameters
Type |
Name |
Description |
Occur |
highFreqOccur |
Occur used for high frequency terms
|
Occur |
lowFreqOccur |
Occur used for low frequency terms
|
System.Single |
maxTermFrequency |
a value in [0..1) (or absolute number >=1) representing the
maximum threshold of a terms document frequency to be considered a
low frequency term.
|
System.Boolean |
disableCoord |
disables Coord(Int32, Int32) in scoring for the low
/ high frequency sub-queries
|
Exceptions
Type |
Condition |
System.ArgumentException |
if MUST_NOT is pass as lowFreqOccur or
highFreqOccur
|
Fields
|
Improve this Doc
View Source
m_disableCoord
Declaration
protected readonly bool m_disableCoord
Field Value
Type |
Description |
System.Boolean |
|
|
Improve this Doc
View Source
m_highFreqBoost
Declaration
protected float m_highFreqBoost
Field Value
Type |
Description |
System.Single |
|
|
Improve this Doc
View Source
m_highFreqMinNrShouldMatch
Declaration
protected float m_highFreqMinNrShouldMatch
Field Value
Type |
Description |
System.Single |
|
|
Improve this Doc
View Source
m_highFreqOccur
Declaration
protected readonly Occur m_highFreqOccur
Field Value
|
Improve this Doc
View Source
m_lowFreqBoost
Declaration
protected float m_lowFreqBoost
Field Value
Type |
Description |
System.Single |
|
|
Improve this Doc
View Source
m_lowFreqMinNrShouldMatch
Declaration
protected float m_lowFreqMinNrShouldMatch
Field Value
Type |
Description |
System.Single |
|
|
Improve this Doc
View Source
m_lowFreqOccur
Declaration
protected readonly Occur m_lowFreqOccur
Field Value
|
Improve this Doc
View Source
m_maxTermFrequency
Declaration
protected readonly float m_maxTermFrequency
Field Value
Type |
Description |
System.Single |
|
|
Improve this Doc
View Source
m_terms
Declaration
protected readonly IList<Term> m_terms
Field Value
Type |
Description |
System.Collections.Generic.IList<Term> |
|
Properties
|
Improve this Doc
View Source
HighFreqMinimumNumberShouldMatch
Gets or Sets a minimum number of the high frequent optional BooleanClauses which must be
satisfied in order to produce a match on the low frequency terms query
part. This method accepts a float value in the range [0..1) as a fraction
of the actual query terms in the low frequent clause or a number
>=1 as an absolut number of clauses that need to match.
By default no optional clauses are necessary for a match (unless there are
no required clauses). If this method is used, then the specified number of
clauses is required.
Declaration
public virtual float HighFreqMinimumNumberShouldMatch { get; set; }
Property Value
Type |
Description |
System.Single |
|
|
Improve this Doc
View Source
IsCoordDisabled
Returns true iff Coord(Int32, Int32) is disabled in scoring
for the high and low frequency query instance. The top level query will
always disable coords.
Declaration
public virtual bool IsCoordDisabled { get; }
Property Value
Type |
Description |
System.Boolean |
|
|
Improve this Doc
View Source
LowFreqMinimumNumberShouldMatch
Gets or Sets a minimum number of the low frequent optional BooleanClauses which must be
satisfied in order to produce a match on the low frequency terms query
part. This method accepts a float value in the range [0..1) as a fraction
of the actual query terms in the low frequent clause or a number
>=1 as an absolut number of clauses that need to match.
By default no optional clauses are necessary for a match (unless there are
no required clauses). If this method is used, then the specified number of
clauses is required.
Declaration
public virtual float LowFreqMinimumNumberShouldMatch { get; set; }
Property Value
Type |
Description |
System.Single |
|
Methods
|
Improve this Doc
View Source
Add(Term)
Declaration
public virtual void Add(Term term)
Parameters
Type |
Name |
Description |
Term |
term |
the term to add
|
|
Improve this Doc
View Source
BuildQuery(Int32, TermContext[], Term[])
Declaration
protected virtual Query BuildQuery(int maxDoc, TermContext[] contextArray, Term[] queryTerms)
Parameters
Type |
Name |
Description |
System.Int32 |
maxDoc |
|
TermContext[] |
contextArray |
|
Term[] |
queryTerms |
|
Returns
|
Improve this Doc
View Source
CalcHighFreqMinimumNumberShouldMatch(Int32)
Declaration
protected virtual int CalcHighFreqMinimumNumberShouldMatch(int numOptional)
Parameters
Type |
Name |
Description |
System.Int32 |
numOptional |
|
Returns
Type |
Description |
System.Int32 |
|
|
Improve this Doc
View Source
CalcLowFreqMinimumNumberShouldMatch(Int32)
Declaration
protected virtual int CalcLowFreqMinimumNumberShouldMatch(int numOptional)
Parameters
Type |
Name |
Description |
System.Int32 |
numOptional |
|
Returns
Type |
Description |
System.Int32 |
|
|
Improve this Doc
View Source
CollectTermContext(IndexReader, IList<AtomicReaderContext>, TermContext[], Term[])
Declaration
public virtual void CollectTermContext(IndexReader reader, IList<AtomicReaderContext> leaves, TermContext[] contextArray, Term[] queryTerms)
Parameters
|
Improve this Doc
View Source
Equals(Object)
Declaration
public override bool Equals(object obj)
Parameters
Type |
Name |
Description |
System.Object |
obj |
|
Returns
Type |
Description |
System.Boolean |
|
Overrides
|
Improve this Doc
View Source
Declaration
public override void ExtractTerms(ISet<Term> terms)
Parameters
Type |
Name |
Description |
System.Collections.Generic.ISet<Term> |
terms |
|
Overrides
|
Improve this Doc
View Source
GetEnumerator()
Returns an enumerator that iterates through the m_terms collection.
Declaration
public IEnumerator<Term> GetEnumerator()
Returns
Type |
Description |
System.Collections.Generic.IEnumerator<Term> |
An enumerator that can be used to iterate through the m_terms collection.
|
|
Improve this Doc
View Source
GetHashCode()
Declaration
public override int GetHashCode()
Returns
Type |
Description |
System.Int32 |
|
Overrides
|
Improve this Doc
View Source
NewTermQuery(Term, TermContext)
Builds a new TermQuery instance.
This is intended for subclasses that wish to customize the generated queries.
Declaration
protected virtual Query NewTermQuery(Term term, TermContext context)
Parameters
Type |
Name |
Description |
Term |
term |
term
|
TermContext |
context |
the TermContext to be used to create the low level term query. Can be null .
|
Returns
|
Improve this Doc
View Source
Rewrite(IndexReader)
Declaration
public override Query Rewrite(IndexReader reader)
Parameters
Returns
Overrides
|
Improve this Doc
View Source
ToString(String)
Declaration
public override string ToString(string field)
Parameters
Type |
Name |
Description |
System.String |
field |
|
Returns
Type |
Description |
System.String |
|
Overrides
Explicit Interface Implementations
|
Improve this Doc
View Source
IEnumerable.GetEnumerator()
Returns an enumerator that iterates through the m_terms collection.
Declaration
IEnumerator IEnumerable.GetEnumerator()
Returns
Type |
Description |
System.Collections.IEnumerator |
An enumerator that can be used to iterate through the m_terms collection.
|
Implements
System.Collections.Generic.IEnumerable<T>
System.Collections.IEnumerable