Class BlockTreeTermsReader<TSubclassState>

A block-based terms index and dictionary that assigns terms to variable length blocks according to how they share prefixes. The terms index is a prefix trie whose leaves are term blocks. The advantage of this approach is that SeekExact() is often able to determine a term cannot exist without doing any IO, and intersection with Automata is very fast. Note that this terms dictionary has it's own fixed terms index (ie, it does not support a pluggable terms index implementation).

NOTE: this terms dictionary does not support index divisor when opening an IndexReader. Instead, you can change the min/maxItemsPerBlock during indexing.

The data structure used by this implementation is very similar to a burst trie (http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.18.3499), but with added logic to break up too-large blocks of all terms sharing a given prefix into smaller ones.

Use CheckIndex with the -verbose option to see summary statistics on the blocks in the dictionary.

See BlockTreeTermsWriter<TSubclassState>.

Note

This API is experimental and might change in incompatible ways in the next release.

Inheritance

object

Fields

FieldsProducer

BlockTreeTermsReader<TSubclassState>

Implements

IEnumerable<string>

IEnumerable

IDisposable

Inherited Members

FieldsProducer.Dispose()

Fields.UniqueTermCount

Fields.EMPTY_ARRAY

object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Namespace: Lucene.Net.Codecs

Assembly: Lucene.Net.dll

Syntax

public class BlockTreeTermsReader<TSubclassState> : FieldsProducer, IEnumerable<string>, IEnumerable, IDisposable

Type Parameters

Name	Description
TSubclassState

Constructors

BlockTreeTermsReader(Directory, FieldInfos, SegmentInfo, PostingsReaderBase, IOContext, string, int, TSubclassState)

Sole constructor.

Declaration

public BlockTreeTermsReader(Directory dir, FieldInfos fieldInfos, SegmentInfo info, PostingsReaderBase postingsReader, IOContext ioContext, string segmentSuffix, int indexDivisor, TSubclassState subclassState)

Parameters

Type	Name	Description
Directory	dir
FieldInfos	fieldInfos
SegmentInfo	info
PostingsReaderBase	postingsReader
IOContext	ioContext
string	segmentSuffix
int	indexDivisor
TSubclassState	subclassState	LUCENENET specific parameter which allows a subclass to set state. It is optional and can be used when overriding the ReadHeader(), ReadIndexHeader() and SeekDir() methods. It only matters in the case where the state is required inside of any of those methods that is passed in to the subclass constructor. `When passed to the constructor, it is set to the protected field m_subclassState before any of the above methods are called where it is available for reading when overriding the above methods. If your subclass needs to pass more than one piece of data, you can create a class or struct to do so. All other virtual members of BlockTreeTermsReader are not called in the constructor, so the overrides of those methods won't specifically need to use this field (although they could for consistency).`

Fields

m_subclassState

NOTE: this terms dictionary does not support index divisor when opening an IndexReader. Instead, you can change the min/maxItemsPerBlock during indexing.

Use CheckIndex with the -verbose option to see summary statistics on the blocks in the dictionary.

See BlockTreeTermsWriter<TSubclassState>.

Note

This API is experimental and might change in incompatible ways in the next release.

Declaration

protected readonly TSubclassState m_subclassState

Field Value

Type	Description
TSubclassState

Properties

Count

Gets the number of fields or -1 if the number of distinct field names is unknown. If >= 0, GetEnumerator() will return as many field names.

NOTE: This was size() in Lucene.

Declaration

public override int Count { get; }

Property Value

Type	Description
int

Overrides

Fields.Count

Methods

CheckIntegrity()

Checks consistency of this reader.

Note that this may be costly in terms of I/O, e.g. may involve computing a checksum value against large data files.

Note

This API is for internal purposes only and might change in incompatible ways in the next release.

Declaration

public override void CheckIntegrity()

Overrides

FieldsProducer.CheckIntegrity()

Dispose(bool)

Disposes all resources used by this object.

Declaration

protected override void Dispose(bool disposing)

Parameters

Type	Name	Description
bool	disposing

Overrides

FieldsProducer.Dispose(bool)

GetEnumerator()

Returns an enumerator that will step through all field names. This will not return null.

Declaration

public override IEnumerator<string> GetEnumerator()

Returns

Type	Description
IEnumerator<string>

Overrides

Fields.GetEnumerator()

GetTerms(string)

Get the Terms for this field. This will return null if the field does not exist.

Declaration

public override Terms GetTerms(string field)

Parameters

Type	Name	Description
string	field

Returns

Type	Description
Terms

Overrides

Fields.GetTerms(string)

RamBytesUsed()

Returns approximate RAM bytes used.

Declaration

public override long RamBytesUsed()

Returns

Type	Description
long

Overrides

FieldsProducer.RamBytesUsed()

ReadHeader(IndexInput)

Reads terms file header.

Declaration

protected virtual int ReadHeader(IndexInput input)

Parameters

Type	Name	Description
IndexInput	input

Returns

Type	Description
int

ReadIndexHeader(IndexInput)

Reads index file header.

Declaration

protected virtual int ReadIndexHeader(IndexInput input)

Parameters

Type	Name	Description
IndexInput	input

Returns

Type	Description
int

SeekDir(IndexInput, long)

Seek input to the directory offset.

Declaration

protected virtual void SeekDir(IndexInput input, long dirOffset)

Parameters

Type	Name	Description
IndexInput	input
long	dirOffset

Implements

IEnumerable<T>

IEnumerable

IDisposable