Namespace Lucene.Net.Search.Spans
The calculus of spans.
A span is a <doc,startPosition,endPosition>
tuple.
The following span query operators are implemented:
- A SpanTermQuery matches all spans containing a particular Term.
- A SpanNearQuery matches spans which occur near one another, and can be used to implement things like phrase search (when constructed from SpanTermQuerys) and inter-phrase proximity (when constructed from other SpanNearQuerys).
- A SpanOrQuery merges spans from a number of other SpanQuerys.
- A SpanNotQuery removes spans matching one SpanQuery which overlap (or comes near) another. This can be used, e.g., to implement within-paragraph search.
- A SpanFirstQuery matches spans matching
q
whose end position is less thann
. This can be used to constrain matches to the first part of the document. - A SpanPositionRangeQuery is a more general form of SpanFirstQuery that can constrain matches to arbitrary portions of the document. In all cases, output spans are minimally inclusive. In other words, a span formed by matching a span in x and y starts at the lesser of the two starts and ends at the greater of the two ends.
For example, a span query which matches "John Kerry" within ten words of "George Bush" within the first 100 words of the document could be constructed with:
SpanQuery john = new SpanTermQuery(new Term("content", "john"));
SpanQuery kerry = new SpanTermQuery(new Term("content", "kerry"));
SpanQuery george = new SpanTermQuery(new Term("content", "george"));
SpanQuery bush = new SpanTermQuery(new Term("content", "bush"));
SpanQuery johnKerry =
new SpanNearQuery(new SpanQuery[] { john, kerry }, 0, true);
SpanQuery georgeBush =
new SpanNearQuery(new SpanQuery[] { george, bush }, 0, true);
SpanQuery johnKerryNearGeorgeBush =
new SpanNearQuery(new SpanQuery[] { johnKerry, georgeBush }, 10, false);
SpanQuery johnKerryNearGeorgeBushAtStart =
new SpanFirstQuery(johnKerryNearGeorgeBush, 100);
Span queries may be freely intermixed with other Lucene queries. So, for example, the above query can be restricted to documents which also use the word "iraq" with:
Query query = new BooleanQuery
{
johnKerryNearGeorgeBushAtStart,
new TermQuery("content", "iraq")
};
Classes
FieldMaskingSpanQuery
Wrapper to allow SpanQuery objects participate in composite single-field SpanQueries by 'lying' about their search field. That is, the masked SpanQuery will function as normal, but Field simply hands back the value supplied in this class's constructor.
This can be used to support Queries like SpanNearQuery or SpanOrQuery across different fields, which is not ordinarily permitted.
This can be useful for denormalized relational data: for example, when indexing a document with conceptually many 'children':
teacherid: 1
studentfirstname: james
studentsurname: jones
teacherid: 2
studenfirstname: james
studentsurname: smith
studentfirstname: sally
studentsurname: jones
A SpanNearQuery with a slop of 0 can be applied across two SpanTermQuery objects as follows:
SpanQuery q1 = new SpanTermQuery(new Term("studentfirstname", "james"));
SpanQuery q2 = new SpanTermQuery(new Term("studentsurname", "jones"));
SpanQuery q2m = new FieldMaskingSpanQuery(q2, "studentfirstname");
Query q = new SpanNearQuery(new SpanQuery[] { q1, q2m }, -1, false);
to search for 'studentfirstname:james studentsurname:jones' and find
teacherid 1 without matching teacherid 2 (which has a 'james' in position 0
and 'jones' in position 1).
Note: as Field returns the masked field, scoring will be done using the Similarity and collection statistics of the field name supplied, but with the term statistics of the real field. This may lead to exceptions, poor performance, and unexpected scoring behavior.
NearSpansOrdered
A Spans that is formed from the ordered subspans of a SpanNearQuery where the subspans do not overlap and have a maximum slop between them.
The formed spans only contains minimum slop matches.
The matching slop is computed from the distance(s) between the non overlapping matching Spans.
Successive matches are always formed from the successive Spans of the SpanNearQuery.
The formed spans may contain overlaps when the slop is at least 1.
For example, when querying using
t1 t2 t3
with slop at least 1, the fragment:
t1 t2 t1 t3 t2 t3
matches twice:
t1 t2 .. t3
t1 .. t2 t3
Expert: Only public for subclassing. Most implementations should not need this class
NearSpansUnordered
Similar to NearSpansOrdered, but for the unordered case.
Expert: Only public for subclassing. Most implementations should not need this class
SpanFirstQuery
Matches spans near the beginning of a field.
This class is a simple extension of SpanPositionRangeQuery in that it assumes the start to be zero and only checks the end boundary.
SpanMultiTermQueryWrapper<Q>
Wraps any MultiTermQuery as a SpanQuery, so it can be nested within other SpanQuery classes.
The query is rewritten by default to a SpanOrQuery containing the expanded terms, but this can be customized.
Example:
WildcardQuery wildcard = new WildcardQuery(new Term("field", "bro?n"));
SpanQuery spanWildcard = new SpanMultiTermQueryWrapper<WildcardQuery>(wildcard);
// do something with spanWildcard, such as use it in a SpanFirstQuery
SpanMultiTermQueryWrapper<Q>.TopTermsSpanBooleanQueryRewrite
A rewrite method that first translates each term into a SpanTermQuery in a SHOULD clause in a BooleanQuery, and keeps the scores as computed by the query.
This rewrite method only uses the top scoring terms so it will not overflow the boolean max clause count.
SpanNearPayloadCheckQuery
Only return those matches that have a specific payload at the given position.
SpanNearQuery
Matches spans which are near one another. One can specify slop, the maximum number of intervening unmatched positions, as well as whether matches are required to be in-order.
SpanNotQuery
Removes matches which overlap with another SpanQuery or within a x tokens before or y tokens after another SpanQuery.
SpanOrQuery
Matches the union of its clauses.
SpanPayloadCheckQuery
Only return those matches that have a specific payload at the given position.
Do not use this with a SpanQuery that contains a SpanNearQuery. Instead, use SpanNearPayloadCheckQuery since it properly handles the fact that payloads aren't ordered by SpanNearQuery.
SpanPositionCheckQuery
Base class for filtering a SpanQuery based on the position of a match.
SpanPositionCheckQuery.PositionCheckSpan
SpanPositionRangeQuery
Checks to see if the Match lies between a start and end position
SpanQuery
Base class for span-based queries.
SpanRewriteMethod
Abstract class that defines how the query is rewritten.
Spans
Expert: an enumeration of span matches. Used to implement span searching. Each span represents a range of term positions within a document. Matches are enumerated in order, by increasing document number, within that by increasing start position and finally by increasing end position.
SpanScorer
Public for extension only.
SpanTermQuery
Matches spans containing a term.
SpanWeight
Expert-only. Public for use by other weight implementations
TermSpans
Expert: Public for extension only
Interfaces
ISpanMultiTermQueryWrapper
LUCENENET specific interface for referring to/identifying a SpanMultiTermQueryWrapper<Q> without referring to its generic closing type.
Enums
SpanPositionCheckQuery.AcceptStatus
Return value for AcceptPosition(Spans).