Namespace Lucene.Net.Search.Spans
The calculus of spans.
A span is a <doc,startPosition,endPosition>
tuple.
The following span query operators are implemented:
- A Span
Term matches all spans containing a particular Term.Query - A Span
Near matches spans which occur near one another, and can be used to implement things like phrase search (when constructed from SpanQuery Term s) and inter-phrase proximity (when constructed from other SpanQuery Near s).Query - A Span
Or merges spans from a number of other SpanQuery Query s. - A Span
Not removes spans matching one SpanQuery Query which overlap (or comes near) another. This can be used, e.g., to implement within-paragraph search. - A Span
First matches spans matchingQuery q
whose end position is less thann
. This can be used to constrain matches to the first part of the document. - A Span
Position is a more general form of SpanFirstQuery that can constrain matches to arbitrary portions of the document. In all cases, output spans are minimally inclusive. In other words, a span formed by matching a span in x and y starts at the lesser of the two starts and ends at the greater of the two ends.Range Query
For example, a span query which matches "John Kerry" within ten words of "George Bush" within the first 100 words of the document could be constructed with:
SpanQuery john = new SpanTermQuery(new Term("content", "john"));
SpanQuery kerry = new SpanTermQuery(new Term("content", "kerry"));
SpanQuery george = new SpanTermQuery(new Term("content", "george"));
SpanQuery bush = new SpanTermQuery(new Term("content", "bush"));
SpanQuery johnKerry =
new SpanNearQuery(new SpanQuery[] { john, kerry }, 0, true);
SpanQuery georgeBush =
new SpanNearQuery(new SpanQuery[] { george, bush }, 0, true);
SpanQuery johnKerryNearGeorgeBush =
new SpanNearQuery(new SpanQuery[] { johnKerry, georgeBush }, 10, false);
SpanQuery johnKerryNearGeorgeBushAtStart =
new SpanFirstQuery(johnKerryNearGeorgeBush, 100);
Span queries may be freely intermixed with other Lucene queries. So, for example, the above query can be restricted to documents which also use the word "iraq" with:
Query query = new BooleanQuery
{
johnKerryNearGeorgeBushAtStart,
new TermQuery("content", "iraq")
};
Classes
FieldMaskingSpanQuery
Wrapper to allow Span
This can be used to support Queries like Span
This can be useful for denormalized relational data: for example, when indexing a document with conceptually many 'children':
teacherid: 1
studentfirstname: james
studentsurname: jones
teacherid: 2
studenfirstname: james
studentsurname: smith
studentfirstname: sally
studentsurname: jones
A Span
SpanQuery q1 = new SpanTermQuery(new Term("studentfirstname", "james"));
SpanQuery q2 = new SpanTermQuery(new Term("studentsurname", "jones"));
SpanQuery q2m = new FieldMaskingSpanQuery(q2, "studentfirstname");
Query q = new SpanNearQuery(new SpanQuery[] { q1, q2m }, -1, false);
to search for 'studentfirstname:james studentsurname:jones' and find
teacherid 1 without matching teacherid 2 (which has a 'james' in position 0
and 'jones' in position 1).
Note: as Field returns the masked field, scoring will be done using the Similarity and collection statistics of the field name supplied, but with the term statistics of the real field. This may lead to exceptions, poor performance, and unexpected scoring behavior.
NearSpansOrdered
A Spans that is formed from the ordered subspans of a Span
The formed spans only contains minimum slop matches.
The matching slop is computed from the distance(s) between the non overlapping matching Spans.
Successive matches are always formed from the successive Spans
of the Span
The formed spans may contain overlaps when the slop is at least 1.
For example, when querying using
t1 t2 t3
with slop at least 1, the fragment:
t1 t2 t1 t3 t2 t3
matches twice:
t1 t2 .. t3
t1 .. t2 t3
Expert: Only public for subclassing. Most implementations should not need this class
NearSpansUnordered
Similar to Near
Expert: Only public for subclassing. Most implementations should not need this class
SpanFirstQuery
Matches spans near the beginning of a field.
This class is a simple extension of Span
SpanMultiTermQueryWrapper<Q>
Wraps any Multi
The query is rewritten by default to a Span
Example:
WildcardQuery wildcard = new WildcardQuery(new Term("field", "bro?n"));
SpanQuery spanWildcard = new SpanMultiTermQueryWrapper<WildcardQuery>(wildcard);
// do something with spanWildcard, such as use it in a SpanFirstQuery
SpanMultiTermQueryWrapper<Q>.TopTermsSpanBooleanQueryRewrite
A rewrite method that first translates each term into a Span
This rewrite method only uses the top scoring terms so it will not overflow the boolean max clause count.
SpanNearPayloadCheckQuery
Only return those matches that have a specific payload at the given position.
SpanNearQuery
Matches spans which are near one another. One can specify slop, the maximum number of intervening unmatched positions, as well as whether matches are required to be in-order.
SpanNotQuery
Removes matches which overlap with another Span
SpanOrQuery
Matches the union of its clauses.
SpanPayloadCheckQuery
Only return those matches that have a specific payload at the given position.
Do not use this with a Span
SpanPositionCheckQuery
Base class for filtering a Span
SpanPositionCheckQuery.PositionCheckSpan
SpanPositionRangeQuery
Checks to see if the Match lies between a start and end position
SpanQuery
Base class for span-based queries.
SpanRewriteMethod
Abstract class that defines how the query is rewritten.
Spans
Expert: an enumeration of span matches. Used to implement span searching. Each span represents a range of term positions within a document. Matches are enumerated in order, by increasing document number, within that by increasing start position and finally by increasing end position.
SpanScorer
Public for extension only.
SpanTermQuery
Matches spans containing a term.
SpanWeight
Expert-only. Public for use by other weight implementations
TermSpans
Expert: Public for extension only
Interfaces
ISpanMultiTermQueryWrapper
LUCENENET specific interface for referring to/identifying a SpanMultiTermQueryWrapper<Q> without referring to its generic closing type.
Enums
SpanPositionCheckQuery.AcceptStatus
Return value for Accept