Class BlockTreeTermsReader<TSubclassState>
A block-based terms index and dictionary that assigns terms to variable length blocks according to how they share prefixes. The terms index is a prefix trie whose leaves are term blocks. The advantage of this approach is that SeekExact() is often able to determine a term cannot exist without doing any IO, and intersection with Automata is very fast. Note that this terms dictionary has it's own fixed terms index (ie, it does not support a pluggable terms index implementation).
NOTE: this terms dictionary does not support index divisor when opening an IndexReader. Instead, you can change the min/maxItemsPerBlock during indexing.
The data structure used by this implementation is very similar to a burst trie (http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.18.3499), but with added logic to break up too-large blocks of all terms sharing a given prefix into smaller ones.
Use CheckIndex with the -verbose
option to see summary statistics on the blocks in the
dictionary.
See BlockTreeTermsWriter<TSubclassState>.
Note
This API is experimental and might change in incompatible ways in the next release.
Inherited Members
Namespace: Lucene.Net.Codecs
Assembly: Lucene.Net.dll
Syntax
public class BlockTreeTermsReader<TSubclassState> : FieldsProducer, IEnumerable<string>, IEnumerable, IDisposable
Type Parameters
Name | Description |
---|---|
TSubclassState |
Constructors
BlockTreeTermsReader(Directory, FieldInfos, SegmentInfo, PostingsReaderBase, IOContext, string, int, TSubclassState)
Sole constructor.
Declaration
public BlockTreeTermsReader(Directory dir, FieldInfos fieldInfos, SegmentInfo info, PostingsReaderBase postingsReader, IOContext ioContext, string segmentSuffix, int indexDivisor, TSubclassState subclassState)
Parameters
Type | Name | Description |
---|---|---|
Directory | dir | |
FieldInfos | fieldInfos | |
SegmentInfo | info | |
PostingsReaderBase | postingsReader | |
IOContext | ioContext | |
string | segmentSuffix | |
int | indexDivisor | |
TSubclassState | subclassState | LUCENENET specific parameter which allows a subclass to set state. It is optional and can be used when overriding the ReadHeader(), ReadIndexHeader() and SeekDir() methods. It only matters in the case where the state is required inside of any of those methods that is passed in to the subclass constructor.
|
Fields
m_subclassState
A block-based terms index and dictionary that assigns terms to variable length blocks according to how they share prefixes. The terms index is a prefix trie whose leaves are term blocks. The advantage of this approach is that SeekExact() is often able to determine a term cannot exist without doing any IO, and intersection with Automata is very fast. Note that this terms dictionary has it's own fixed terms index (ie, it does not support a pluggable terms index implementation).
NOTE: this terms dictionary does not support index divisor when opening an IndexReader. Instead, you can change the min/maxItemsPerBlock during indexing.
The data structure used by this implementation is very similar to a burst trie (http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.18.3499), but with added logic to break up too-large blocks of all terms sharing a given prefix into smaller ones.
Use CheckIndex with the -verbose
option to see summary statistics on the blocks in the
dictionary.
See BlockTreeTermsWriter<TSubclassState>.
Note
This API is experimental and might change in incompatible ways in the next release.
Declaration
protected readonly TSubclassState m_subclassState
Field Value
Type | Description |
---|---|
TSubclassState |
Properties
Count
Gets the number of fields or -1 if the number of distinct field names is unknown. If >= 0, GetEnumerator() will return as many field names.
NOTE: This was size() in Lucene.Declaration
public override int Count { get; }
Property Value
Type | Description |
---|---|
int |
Overrides
Methods
CheckIntegrity()
Checks consistency of this reader.
Note that this may be costly in terms of I/O, e.g. may involve computing a checksum value against large data files.Note
This API is for internal purposes only and might change in incompatible ways in the next release.
Declaration
public override void CheckIntegrity()
Overrides
Dispose(bool)
Disposes all resources used by this object.
Declaration
protected override void Dispose(bool disposing)
Parameters
Type | Name | Description |
---|---|---|
bool | disposing |
Overrides
GetEnumerator()
Returns an enumerator that will step through all field
names. This will not return null
.
Declaration
public override IEnumerator<string> GetEnumerator()
Returns
Type | Description |
---|---|
IEnumerator<string> |
Overrides
GetTerms(string)
Get the Terms for this field. This will return
null
if the field does not exist.
Declaration
public override Terms GetTerms(string field)
Parameters
Type | Name | Description |
---|---|---|
string | field |
Returns
Type | Description |
---|---|
Terms |
Overrides
RamBytesUsed()
Returns approximate RAM bytes used.
Declaration
public override long RamBytesUsed()
Returns
Type | Description |
---|---|
long |
Overrides
ReadHeader(IndexInput)
Reads terms file header.
Declaration
protected virtual int ReadHeader(IndexInput input)
Parameters
Type | Name | Description |
---|---|---|
IndexInput | input |
Returns
Type | Description |
---|---|
int |
ReadIndexHeader(IndexInput)
Reads index file header.
Declaration
protected virtual int ReadIndexHeader(IndexInput input)
Parameters
Type | Name | Description |
---|---|---|
IndexInput | input |
Returns
Type | Description |
---|---|
int |
SeekDir(IndexInput, long)
Seek input
to the directory offset.
Declaration
protected virtual void SeekDir(IndexInput input, long dirOffset)
Parameters
Type | Name | Description |
---|---|---|
IndexInput | input | |
long | dirOffset |