Namespace Lucene.Net.Join
This modules support index-time and query-time joins.
Index-time joins
The index-time joining support joins while searching, where joined documents are indexed as a single document block using IndexWriter.addDocuments. This is useful for any normalized content (XML documents or database tables). In database terms, all rows for all joined tables matching a single row of the primary table must be indexed as a single document block, with the parent document being last in the group.
When you index in this way, the documents in your index are divided into parent documents (the last document of each block) and child documents (all others). You provide a Filter that identifies the parent documents, as Lucene does not currently record any information about doc blocks.
At search time, use ToParentBlockJoinQuery to remap/join matches from any child Query (ie, a query that matches only child documents) up to the parent document space. The resulting query can then be used as a clause in any query that matches parent.
If you only care about the parent documents matching the query, you can use any collector to collect the parent hits, but if you'd also like to see which child documents match for each parent document, use the ToParentBlockJoinCollector to collect the hits. Once the search is done, you retrieve a <xref:Lucene.Net.Grouping.TopGroups> instance from the ToParentBlockJoinCollector.getTopGroups method.
To map/join in the opposite direction, use ToChildBlockJoinQuery. This wraps any query matching parent documents, creating the joined query matching only child documents.
Query-time joins
The query time joining is index term based and implemented as two pass search. The first pass collects all the terms from a fromField that match the fromQuery. The second pass returns all documents that have matching terms in a toField to the terms collected in the first pass.
Query time joining has the following input:
fromField
: The from field to join from.fromQuery
: The query executed to collect the from terms. This is usually the user specified query.multipleValuesPerDocument
: Whether the fromField contains more than one value per documentscoreMode
: Defines how scores are translated to the other join side. If you don't care about scoring use #None mode. This will disable scoring and is therefore more efficient (requires less memory and is faster).toField
: The to field to join toBasically the query-time joining is accessible from one static method. The user of this method supplies the method with the described input and a
IndexSearcher
where the from terms need to be collected from. The returned query can be executed with the sameIndexSearcher
, but also with anotherIndexSearcher
. Example usage of the JoinUtil.createJoinQuery :String fromField = "from"; // Name of the from field boolean multipleValuesPerDocument = false; // Set only yo true in the case when your fromField has multiple values per document in your index String toField = "to"; // Name of the to field ScoreMode scoreMode = ScoreMode.Max // Defines how the scores are translated into the other side of the join. Query fromQuery = new TermQuery(new Term("content", searchTerm)); // Query executed to collect from values to join to the to values
Query joinQuery = JoinUtil.createJoinQuery(fromField, multipleValuesPerDocument, toField, fromQuery, fromSearcher, scoreMode); TopDocs topDocs = toSearcher.search(joinQuery, 10); // Note: toSearcher can be the same as the fromSearcher // Render topDocs...
Classes
FixedBitSetCachingWrapperFilter
A CachingWrapperFilter that caches sets using a FixedBitSet, as required for joins.
JoinUtil
Utility for query time joining using Lucene.Net.Join.TermsQuery and Lucene.Net.Join.TermsCollector.
ToChildBlockJoinQuery
Just like ToParentBlockJoinQuery, except this query joins in reverse: you provide a Query matching parent documents and it joins down to child documents.
ToParentBlockJoinCollector
Collects parent document hits for a Query containing one more more BlockJoinQuery clauses, sorted by the specified parent Sort. Note that this cannot perform arbitrary joins; rather, it requires that all joined documents are indexed as a doc block (using AddDocuments(IEnumerable<IEnumerable<IIndexableField>>, Analyzer) or UpdateDocuments(Term, IEnumerable<IEnumerable<IIndexableField>>, Analyzer). Ie, the join is computed at index time.
The parent Sort must only use fields from the parent documents; sorting by field in the child documents is not supported.
You should only use this collector if one or more of the clauses in the query is a ToParentBlockJoinQuery. This collector will find those query clauses and record the matching child documents for the top scoring parent documents.
Multiple joins (star join) and nested joins and a mix of the two are allowed, as long as in all cases the documents corresponding to a single row of each joined parent table were indexed as a doc block.
For the simple star join you can retrieve the ITopGroups<TGroupValue> instance containing each ToParentBlockJoinQuery's matching child documents for the top parent groups, using GetTopGroups(ToParentBlockJoinQuery, Sort, Int32, Int32, Int32, Boolean). Ie, a single query, which will contain two or more ToParentBlockJoinQuery's as clauses representing the star join, can then retrieve two or more ITopGroups<TGroupValue> instances.
For nested joins, the query will run correctly (ie, match the right parent and child documents), however, because TopGroups<TGroupValue> is currently unable to support nesting (each group is not able to hold another TopGroups<TGroupValue>), you are only able to retrieve the TopGroups<TGroupValue> of the first join. The TopGroups<TGroupValue> of the nested joins will not be correct.
See http://lucene.apache.org/core/4_8_0/join/ for a code sample.
ToParentBlockJoinFieldComparer
A field comparer that allows parent documents to be sorted by fields from the nested / child documents.
ToParentBlockJoinFieldComparer.Highest
Concrete implementation of ToParentBlockJoinSortField to sorts the parent docs with the highest values in the child / nested docs first.
ToParentBlockJoinFieldComparer.Lowest
Concrete implementation of ToParentBlockJoinSortField to sorts the parent docs with the lowest values in the child / nested docs first.
ToParentBlockJoinQuery
This query requires that you index children and parent docs as a single block, using the AddDocuments(IEnumerable<IEnumerable<IIndexableField>>, Analyzer) or UpdateDocuments(Term, IEnumerable<IEnumerable<IIndexableField>>, Analyzer) API. In each block, the child documents must appear first, ending with the parent document. At search time you provide a Filter identifying the parents, however this Filter must provide an FixedBitSet per sub-reader.
Once the block index is built, use this query to wrap any sub-query matching only child docs and join matches in that child document space up to the parent document space. You can then use this Query as a clause with other queries in the parent document space.
See ToChildBlockJoinQuery if you need to join in the reverse order.
The child documents must be orthogonal to the parent documents: the wrapped child query must never return a parent document.
If you'd like to retrieve ITopGroups<TGroupValue> for the resulting query, use the ToParentBlockJoinCollector. Note that this is not necessary, ie, if you simply want to collect the parent documents and don't need to see which child documents matched under that parent, then you can use any collector.
NOTE: If the overall query contains parent-only matches, for example you OR a parent-only query with a joined child-only query, then the resulting collected documents will be correct, however the ITopGroups<TGroupValue> you get from ToParentBlockJoinCollector will not contain every child for parents that had matched.
See http://lucene.apache.org/core/4_8_0/join/ for an overview.
ToParentBlockJoinSortField
A special sort field that allows sorting parent docs based on nested / child level fields. Based on the sort order it either takes the document with the lowest or highest field value into account.
Enums
ScoreMode
How to aggregate multiple child hit scores into a single parent score.