Class FieldCacheTermsFilter

A Filter that only accepts documents whose single term value in the specified field is contained in the provided set of allowed terms.

This is the same functionality as TermsFilter (from queries/), except this filter requires that the field contains only a single term for all documents. Because of drastically different implementations, they also have different performance characteristics, as described below.

The first invocation of this filter on a given field will be slower, since a SortedDocValues must be created. Subsequent invocations using the same field will re-use this cache. However, as with all functionality based on IFieldCache, persistent RAM is consumed to hold the cache, and is not freed until the IndexReader is disposed. In contrast, TermsFilter has no persistent RAM consumption.

With each search, this filter translates the specified set of Terms into a private FixedBitSet keyed by term number per unique IndexReader (normally one reader per segment). Then, during matching, the term number for each docID is retrieved from the cache and then checked for inclusion using the FixedBitSet. Since all testing is done using RAM resident data structures, performance should be very fast, most likely fast enough to not require further caching of the DocIdSet for each possible combination of terms. However, because docIDs are simply scanned linearly, an index with a great many small documents may find this linear scan too costly.

In contrast, TermsFilter builds up a FixedBitSet, keyed by docID, every time it's created, by enumerating through all matching docs using DocsEnum to seek and scan through each term's docID list. While there is no linear scan of all docIDs, besides the allocation of the underlying array in the FixedBitSet, this approach requires a number of "disk seeks" in proportion to the number of terms, which can be exceptionally costly when there are cache misses in the OS's IO cache.

Generally, this filter will be slower on the first invocation for a given field, but subsequent invocations, even if you change the allowed set of Terms, should be faster than TermsFilter, especially as the number of Terms being matched increases. If you are matching only a very small number of terms, and those terms in turn match a very small number of documents, TermsFilter may perform faster.

Which filter is best is very application dependent.

A Filter that only accepts documents whose single term value in the specified field is contained in the provided set of allowed terms.

This is the same functionality as TermsFilter (from queries/), except this filter requires that the field contains only a single term for all documents. Because of drastically different implementations, they also have different performance characteristics, as described below.

The first invocation of this filter on a given field will be slower, since a SortedDocValues must be created. Subsequent invocations using the same field will re-use this cache. However, as with all functionality based on IFieldCache, persistent RAM is consumed to hold the cache, and is not freed until the IndexReader is disposed. In contrast, TermsFilter has no persistent RAM consumption.

With each search, this filter translates the specified set of Terms into a private FixedBitSet keyed by term number per unique IndexReader (normally one reader per segment). Then, during matching, the term number for each docID is retrieved from the cache and then checked for inclusion using the FixedBitSet. Since all testing is done using RAM resident data structures, performance should be very fast, most likely fast enough to not require further caching of the DocIdSet for each possible combination of terms. However, because docIDs are simply scanned linearly, an index with a great many small documents may find this linear scan too costly.

In contrast, TermsFilter builds up a FixedBitSet, keyed by docID, every time it's created, by enumerating through all matching docs using DocsEnum to seek and scan through each term's docID list. While there is no linear scan of all docIDs, besides the allocation of the underlying array in the FixedBitSet, this approach requires a number of "disk seeks" in proportion to the number of terms, which can be exceptionally costly when there are cache misses in the OS's IO cache.

Generally, this filter will be slower on the first invocation for a given field, but subsequent invocations, even if you change the allowed set of Terms, should be faster than TermsFilter, especially as the number of Terms being matched increases. If you are matching only a very small number of terms, and those terms in turn match a very small number of documents, TermsFilter may perform faster.

Which filter is best is very application dependent.

Type	Name	Description
string	field
BytesRef[]	terms

A Filter that only accepts documents whose single term value in the specified field is contained in the provided set of allowed terms.

This is the same functionality as TermsFilter (from queries/), except this filter requires that the field contains only a single term for all documents. Because of drastically different implementations, they also have different performance characteristics, as described below.

The first invocation of this filter on a given field will be slower, since a SortedDocValues must be created. Subsequent invocations using the same field will re-use this cache. However, as with all functionality based on IFieldCache, persistent RAM is consumed to hold the cache, and is not freed until the IndexReader is disposed. In contrast, TermsFilter has no persistent RAM consumption.

With each search, this filter translates the specified set of Terms into a private FixedBitSet keyed by term number per unique IndexReader (normally one reader per segment). Then, during matching, the term number for each docID is retrieved from the cache and then checked for inclusion using the FixedBitSet. Since all testing is done using RAM resident data structures, performance should be very fast, most likely fast enough to not require further caching of the DocIdSet for each possible combination of terms. However, because docIDs are simply scanned linearly, an index with a great many small documents may find this linear scan too costly.

In contrast, TermsFilter builds up a FixedBitSet, keyed by docID, every time it's created, by enumerating through all matching docs using DocsEnum to seek and scan through each term's docID list. While there is no linear scan of all docIDs, besides the allocation of the underlying array in the FixedBitSet, this approach requires a number of "disk seeks" in proportion to the number of terms, which can be exceptionally costly when there are cache misses in the OS's IO cache.

Generally, this filter will be slower on the first invocation for a given field, but subsequent invocations, even if you change the allowed set of Terms, should be faster than TermsFilter, especially as the number of Terms being matched increases. If you are matching only a very small number of terms, and those terms in turn match a very small number of documents, TermsFilter may perform faster.

Which filter is best is very application dependent.

Type	Name	Description
string	field
string[]	terms

A Filter that only accepts documents whose single term value in the specified field is contained in the provided set of allowed terms.

This is the same functionality as TermsFilter (from queries/), except this filter requires that the field contains only a single term for all documents. Because of drastically different implementations, they also have different performance characteristics, as described below.

The first invocation of this filter on a given field will be slower, since a SortedDocValues must be created. Subsequent invocations using the same field will re-use this cache. However, as with all functionality based on IFieldCache, persistent RAM is consumed to hold the cache, and is not freed until the IndexReader is disposed. In contrast, TermsFilter has no persistent RAM consumption.

With each search, this filter translates the specified set of Terms into a private FixedBitSet keyed by term number per unique IndexReader (normally one reader per segment). Then, during matching, the term number for each docID is retrieved from the cache and then checked for inclusion using the FixedBitSet. Since all testing is done using RAM resident data structures, performance should be very fast, most likely fast enough to not require further caching of the DocIdSet for each possible combination of terms. However, because docIDs are simply scanned linearly, an index with a great many small documents may find this linear scan too costly.

In contrast, TermsFilter builds up a FixedBitSet, keyed by docID, every time it's created, by enumerating through all matching docs using DocsEnum to seek and scan through each term's docID list. While there is no linear scan of all docIDs, besides the allocation of the underlying array in the FixedBitSet, this approach requires a number of "disk seeks" in proportion to the number of terms, which can be exceptionally costly when there are cache misses in the OS's IO cache.

Generally, this filter will be slower on the first invocation for a given field, but subsequent invocations, even if you change the allowed set of Terms, should be faster than TermsFilter, especially as the number of Terms being matched increases. If you are matching only a very small number of terms, and those terms in turn match a very small number of documents, TermsFilter may perform faster.

Which filter is best is very application dependent.

Type	Description
IFieldCache

Creates a DocIdSet enumerating the documents that should be permitted in search results. NOTE:null can be returned if no documents are accepted by this Filter.

Note: this method will be called once per segment in the index during searching. The returned DocIdSet must refer to document IDs for that segment, not for the top-level reader.

Type	Name	Description
AtomicReaderContext	context	a AtomicReaderContext instance opened on the index currently searched on. Note, it is likely that the provided reader info does not represent the whole underlying index i.e. if the index has more than one segment the given reader only represents a single segment. The provided context is always an atomic context, so you can call Fields on the context's reader, for example.
IBits	acceptDocs	IBits that represent the allowable docs to match (typically deleted docs but possibly filtering other documents)

Inheritance

Inherited Members

Namespace: Lucene.Net.Search

Assembly: Lucene.Net.dll

Syntax

Constructors

FieldCacheTermsFilter(string, params BytesRef[])

Declaration

Parameters

FieldCacheTermsFilter(string, params string[])

Declaration

Parameters

Properties

FieldCache

Declaration

Property Value

Methods

GetDocIdSet(AtomicReaderContext, IBits)

Declaration

Parameters

Returns

Overrides