Class CachingCollector

Caches all docs, and optionally also scores, coming from a search, and is then able to replay them to another collector. You specify the max RAM this class may use. Once the collection is done, call IsCached. If this returns true, you can use Replay(ICollector) against a new collector. If it returns false, this means too much RAM was required and you must instead re-run the original search.

NOTE: this class consumes 4 (or 8 bytes, if scoring is cached) per collected document. If the result set is large this can easily be a very substantial amount of RAM!

NOTE: this class caches at least 128 documents before checking RAM limits.

See the Lucene modules/grouping module for more details including a full code example.

Note

This API is experimental and might change in incompatible ways in the next release.

Inheritance

object

CachingCollector

Implements

ICollector

Inherited Members

object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Namespace: Lucene.Net.Search

Assembly: Lucene.Net.dll

Syntax

public abstract class CachingCollector : ICollector

Fields

m_base

NOTE: this class consumes 4 (or 8 bytes, if scoring is cached) per collected document. If the result set is large this can easily be a very substantial amount of RAM!

NOTE: this class caches at least 128 documents before checking RAM limits.

See the Lucene modules/grouping module for more details including a full code example.

Note

This API is experimental and might change in incompatible ways in the next release.

Declaration

protected int m_base

Field Value

Type	Description
int

m_cachedDocs

NOTE: this class consumes 4 (or 8 bytes, if scoring is cached) per collected document. If the result set is large this can easily be a very substantial amount of RAM!

NOTE: this class caches at least 128 documents before checking RAM limits.

See the Lucene modules/grouping module for more details including a full code example.

Note

This API is experimental and might change in incompatible ways in the next release.

Declaration

protected readonly IList<int[]> m_cachedDocs

Field Value

Type	Description
IList<int[]>

m_curDocs

NOTE: this class consumes 4 (or 8 bytes, if scoring is cached) per collected document. If the result set is large this can easily be a very substantial amount of RAM!

NOTE: this class caches at least 128 documents before checking RAM limits.

See the Lucene modules/grouping module for more details including a full code example.

Note

This API is experimental and might change in incompatible ways in the next release.

Declaration

protected int[] m_curDocs

Field Value

Type	Description
int[]

m_lastDocBase

NOTE: this class consumes 4 (or 8 bytes, if scoring is cached) per collected document. If the result set is large this can easily be a very substantial amount of RAM!

NOTE: this class caches at least 128 documents before checking RAM limits.

See the Lucene modules/grouping module for more details including a full code example.

Note

This API is experimental and might change in incompatible ways in the next release.

Declaration

protected int m_lastDocBase

Field Value

Type	Description
int

m_maxDocsToCache

NOTE: this class consumes 4 (or 8 bytes, if scoring is cached) per collected document. If the result set is large this can easily be a very substantial amount of RAM!

NOTE: this class caches at least 128 documents before checking RAM limits.

See the Lucene modules/grouping module for more details including a full code example.

Note

This API is experimental and might change in incompatible ways in the next release.

Declaration

protected readonly int m_maxDocsToCache

Field Value

Type	Description
int

m_other

NOTE: this class consumes 4 (or 8 bytes, if scoring is cached) per collected document. If the result set is large this can easily be a very substantial amount of RAM!

NOTE: this class caches at least 128 documents before checking RAM limits.

See the Lucene modules/grouping module for more details including a full code example.

Note

This API is experimental and might change in incompatible ways in the next release.

Declaration

protected readonly ICollector m_other

Field Value

Type	Description
ICollector

m_upto

NOTE: this class consumes 4 (or 8 bytes, if scoring is cached) per collected document. If the result set is large this can easily be a very substantial amount of RAM!

NOTE: this class caches at least 128 documents before checking RAM limits.

See the Lucene modules/grouping module for more details including a full code example.

Note

This API is experimental and might change in incompatible ways in the next release.

Declaration

protected int m_upto

Field Value

Type	Description
int

Properties

AcceptsDocsOutOfOrder

Return true if this collector does not require the matching docIDs to be delivered in int sort order (smallest to largest) to Collect(int).

Most Lucene Query implementations will visit matching docIDs in order. However, some queries (currently limited to certain cases of BooleanQuery) can achieve faster searching if the ICollector allows them to deliver the docIDs out of order.

Many collectors don't mind getting docIDs out of order, so it's important to return true here.

Declaration

public virtual bool AcceptsDocsOutOfOrder { get; }

Property Value

Type	Description
bool

IsCached

NOTE: this class consumes 4 (or 8 bytes, if scoring is cached) per collected document. If the result set is large this can easily be a very substantial amount of RAM!

NOTE: this class caches at least 128 documents before checking RAM limits.

See the Lucene modules/grouping module for more details including a full code example.

Note

This API is experimental and might change in incompatible ways in the next release.

Declaration

public virtual bool IsCached { get; }

Property Value

Type	Description
bool

Methods

Collect(int)

Called once for every document matching a query, with the unbased document number.

Note: The collection of the current segment can be terminated by throwing a CollectionTerminatedException. In this case, the last docs of the current AtomicReaderContext will be skipped and IndexSearcher will swallow the exception and continue collection with the next leaf.

Note: this is called in an inner search loop. For good search performance, implementations of this method should not call Doc(int) or Document(int) on every hit. Doing so can slow searches by an order of magnitude or more.

Declaration

public abstract void Collect(int doc)

Parameters

Type	Name	Description
int	doc

Create(ICollector, bool, double)

Create a new CachingCollector that wraps the given collector and caches documents and scores up to the specified RAM threshold.

Declaration

public static CachingCollector Create(ICollector other, bool cacheScores, double maxRAMMB)

Parameters

Type	Name	Description
ICollector	other	The ICollector to wrap and delegate calls to.
bool	cacheScores	Whether to cache scores in addition to document IDs. Note that this increases the RAM consumed per doc.
double	maxRAMMB	The maximum RAM in MB to consume for caching the documents and scores. If the collector exceeds the threshold, no documents and scores are cached.

Returns

Type	Description
CachingCollector

Create(ICollector, bool, int)

Create a new CachingCollector that wraps the given collector and caches documents and scores up to the specified max docs threshold.

Declaration

public static CachingCollector Create(ICollector other, bool cacheScores, int maxDocsToCache)

Parameters

Type	Name	Description
ICollector	other	The ICollector to wrap and delegate calls to.
bool	cacheScores	Whether to cache scores in addition to document IDs. Note that this increases the RAM consumed per doc.
int	maxDocsToCache	The maximum number of documents for caching the documents and possible the scores. If the collector exceeds the threshold, no documents and scores are cached.

Returns

Type	Description
CachingCollector

Create(bool, bool, double)

Creates a CachingCollector which does not wrap another collector. The cached documents and scores can later be replayed (Replay(ICollector)).

Declaration

public static CachingCollector Create(bool acceptDocsOutOfOrder, bool cacheScores, double maxRAMMB)

Parameters

Type	Name	Description
bool	acceptDocsOutOfOrder	whether documents are allowed to be collected out-of-order
bool	cacheScores
double	maxRAMMB

Returns

Type	Description
CachingCollector

Replay(ICollector)

Replays the cached doc IDs (and scores) to the given ICollector. If this instance does not cache scores, then Scorer is not set on other.SetScorer(Scorer) as well as scores are not replayed.

Declaration

public abstract void Replay(ICollector other)

Parameters

Type	Name	Description
ICollector	other

Exceptions

Type	Condition
InvalidOperationException	If this collector is not cached (i.e., if the RAM limits were too low for the number of documents + scores to cache).
ArgumentException	If the given Collect's does not support out-of-order collection, while the collector passed to the ctor does.

SetNextReader(AtomicReaderContext)

Called before collecting from each AtomicReaderContext. All doc ids in Collect(int) will correspond to Reader.

Add DocBase to the current Reader's internal document id to re-base ids in Collect(int).

Declaration

public virtual void SetNextReader(AtomicReaderContext context)

Parameters

Type	Name	Description
AtomicReaderContext	context	next atomic reader context

SetScorer(Scorer)

Called before successive calls to Collect(int). Implementations that need the score of the current document (passed-in to ), should save the passed-in Scorer and call GetScore() when needed.

Declaration

public abstract void SetScorer(Scorer scorer)

Parameters

Type	Name	Description
Scorer	scorer

Implements

ICollector