Namespace Lucene.Net.Spatial.Prefix
Prefix Tree Strategy
Classes
AbstractPrefixTreeFilter
Base class for Lucene Filters on SpatialPrefixTree fields.
AbstractPrefixTreeFilter.BaseTermsEnumTraverser
Holds transient state and docid collecting utility methods as part of traversing a TermsEnum.
AbstractVisitingPrefixTreeFilter
Traverses a SpatialPrefixTree indexed field, using the template & visitor design patterns for subclasses to guide the traversal and collect matching documents.
Subclasses implement GetDocIdSet(AtomicReaderContext, IBits) by instantiating a custom AbstractVisitingPrefixTreeFilter.VisitorTemplate subclass (i.e. an anonymous inner class) and implement the required methods.
AbstractVisitingPrefixTreeFilter.VisitorTemplate
An abstract class designed to make it easy to implement predicates or
other operations on a SpatialPrefixTree indexed field. An instance
of this class is not designed to be re-used across AtomicReaderContext
instances so simply create a new one for each call to, say a
GetDocIdSet(AtomicReaderContext, IBits).
The GetDocIdSet() method here starts the work. It first checks
that there are indexed terms; if not it quickly returns null. Then it calls
Start() so a subclass can set up a return value, like an
FixedBitSet. Then it starts the traversal
process, calling FindSubCellsToVisit(Cell)
which by default finds the top cells that intersect queryShape
. If
there isn't an indexed cell for a corresponding cell returned for this
method then it's short-circuited until it finds one, at which point
Visit(Cell) is called. At
some depths, of the tree, the algorithm switches to a scanning mode that
calls VisitScanned(Cell)
for each leaf cell found.
AbstractVisitingPrefixTreeFilter.VNode
A Visitor node/cell found via the query shape for AbstractVisitingPrefixTreeFilter.VisitorTemplate. Sometimes these are reset(cell). It's like a LinkedList node but forms a tree.
ContainsPrefixTreeFilter
Finds docs where its indexed shape Contains the query shape. For use on RecursivePrefixTreeStrategy.
IntersectsPrefixTreeFilter
A Filter matching documents that have an
PointPrefixTreeFieldCacheProvider
Implementation of ShapeFieldCacheProvider<T> designed for PrefixTreeStrategys.
Note, due to the fragmented representation of Shapes in these Strategies, this implementation
can only retrieve the central
PrefixTreeStrategy
An abstract SpatialStrategy based on SpatialPrefixTree. The two subclasses are RecursivePrefixTreeStrategy and TermQueryPrefixTreeStrategy. This strategy is most effective as a fast approximate spatial search filter.
Characteristics:
- Can index any shape; however only RecursivePrefixTreeStrategy can effectively search non-point shapes.
- Can index a variable number of shapes per field value. This strategy can do it via multiple calls to CreateIndexableFields(IShape) for a document or by giving it some sort of Shape aggregate (e.g. NTS WKT MultiPoint). The shape's boundary is approximated to a grid precision.
- Can query with any shape. The shape's boundary is approximated to a grid precision.
- Only Intersects is supported. If only points are indexed then this is effectively equivalent to IsWithin.
- The strategy supports MakeDistanceValueSource(IPoint, Double)
even for multi-valued data, so long as the indexed data is all points; the
behavior is undefined otherwise. However,
it will likely be removed in the future
in lieu of using another strategy with a more scalable implementation. Use of this call is the only circumstance in which a cache is used. The cache is simple but as such it doesn't scale to large numbers of points nor is it real-time-search friendly.
Implementation:
The SpatialPrefixTree does most of the work, for example returning a list of terms representing grids of various sizes for a supplied shape. An important configuration item is DistErrPct which balances shape precision against scalability. See those docs.RecursivePrefixTreeStrategy
A PrefixTreeStrategy which uses AbstractVisitingPrefixTreeFilter. This strategy has support for searching non-point shapes (note: not tested). Even a query shape with distErrPct=0 (fully precise to the grid) should have good performance for typical data, unless there is a lot of indexed data coincident with the shape's edge.
TermQueryPrefixTreeStrategy
A basic implementation of PrefixTreeStrategy using a large TermsFilter of all the cells from GetCells(IShape, Int32, Boolean, Boolean). It only supports the search of indexed Point shapes.
The precision of query shapes (DistErrPct) is an important factor in using this Strategy. If the precision is too precise then it will result in many terms which will amount to a slower query.
WithinPrefixTreeFilter
Finds docs where its indexed shape is
IsWithin
the query shape. It works by looking at cells outside of the query
shape to ensure documents there are excluded. By default, it will
examine all cells, and it's fairly slow. If you know that the indexed shapes
are never comprised of multiple disjoint parts (which also means it is not multi-valued),
then you can pass SpatialPrefixTree.GetDistanceForLevel(maxLevels)
as
the queryBuffer
constructor parameter to minimally look this distance
beyond the query shape's edge. Even if the indexed shapes are sometimes
comprised of multiple disjoint parts, you might want to use this option with
a large buffer as a faster approximation with minimal false-positives.