Fork me on GitHub
  • API

    Show / Hide Table of Contents

    Class BlockTreeTermsReader<TSubclassState>

    A block-based terms index and dictionary that assigns terms to variable length blocks according to how they share prefixes. The terms index is a prefix trie whose leaves are term blocks. The advantage of this approach is that SeekExact() is often able to determine a term cannot exist without doing any IO, and intersection with Automata is very fast. Note that this terms dictionary has it's own fixed terms index (ie, it does not support a pluggable terms index implementation).

    NOTE: this terms dictionary does not support index divisor when opening an IndexReader. Instead, you can change the min/maxItemsPerBlock during indexing.

    The data structure used by this implementation is very similar to a burst trie (http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.18.3499), but with added logic to break up too-large blocks of all terms sharing a given prefix into smaller ones.

    Use CheckIndex with the -verbose option to see summary statistics on the blocks in the dictionary.

    See BlockTreeTermsWriter<TSubclassState>.

    Note

    This API is experimental and might change in incompatible ways in the next release.

    Inheritance
    object
    Fields
    FieldsProducer
    BlockTreeTermsReader<TSubclassState>
    Implements
    IEnumerable<string>
    IEnumerable
    IDisposable
    Inherited Members
    FieldsProducer.Dispose()
    Fields.UniqueTermCount
    Fields.EMPTY_ARRAY
    object.Equals(object)
    object.Equals(object, object)
    object.GetHashCode()
    object.GetType()
    object.MemberwiseClone()
    object.ReferenceEquals(object, object)
    object.ToString()
    Namespace: Lucene.Net.Codecs
    Assembly: Lucene.Net.dll
    Syntax
    public class BlockTreeTermsReader<TSubclassState> : FieldsProducer, IEnumerable<string>, IEnumerable, IDisposable
    Type Parameters
    Name Description
    TSubclassState

    Constructors

    BlockTreeTermsReader(Directory, FieldInfos, SegmentInfo, PostingsReaderBase, IOContext, string, int, TSubclassState)

    Sole constructor.

    Declaration
    public BlockTreeTermsReader(Directory dir, FieldInfos fieldInfos, SegmentInfo info, PostingsReaderBase postingsReader, IOContext ioContext, string segmentSuffix, int indexDivisor, TSubclassState subclassState)
    Parameters
    Type Name Description
    Directory dir
    FieldInfos fieldInfos
    SegmentInfo info
    PostingsReaderBase postingsReader
    IOContext ioContext
    string segmentSuffix
    int indexDivisor
    TSubclassState subclassState

    LUCENENET specific parameter which allows a subclass to set state. It is optional and can be used when overriding the ReadHeader(), ReadIndexHeader() and SeekDir() methods. It only matters in the case where the state is required inside of any of those methods that is passed in to the subclass constructor.

     When passed to the constructor, it is set to the protected field m_subclassState before
     any of the above methods are called where it is available for reading when overriding the above methods.
    
     If your subclass needs to pass more than one piece of data, you can create a class or struct to do so.
     All other virtual members of BlockTreeTermsReader are not called in the constructor,
     so the overrides of those methods won't specifically need to use this field (although they could for consistency).
    

    Fields

    m_subclassState

    A block-based terms index and dictionary that assigns terms to variable length blocks according to how they share prefixes. The terms index is a prefix trie whose leaves are term blocks. The advantage of this approach is that SeekExact() is often able to determine a term cannot exist without doing any IO, and intersection with Automata is very fast. Note that this terms dictionary has it's own fixed terms index (ie, it does not support a pluggable terms index implementation).

    NOTE: this terms dictionary does not support index divisor when opening an IndexReader. Instead, you can change the min/maxItemsPerBlock during indexing.

    The data structure used by this implementation is very similar to a burst trie (http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.18.3499), but with added logic to break up too-large blocks of all terms sharing a given prefix into smaller ones.

    Use CheckIndex with the -verbose option to see summary statistics on the blocks in the dictionary.

    See BlockTreeTermsWriter<TSubclassState>.

    Note

    This API is experimental and might change in incompatible ways in the next release.

    Declaration
    protected readonly TSubclassState m_subclassState
    Field Value
    Type Description
    TSubclassState

    Properties

    Count

    Gets the number of fields or -1 if the number of distinct field names is unknown. If >= 0, GetEnumerator() will return as many field names.

    NOTE: This was size() in Lucene.
    Declaration
    public override int Count { get; }
    Property Value
    Type Description
    int
    Overrides
    Fields.Count

    Methods

    CheckIntegrity()

    Checks consistency of this reader.

    Note that this may be costly in terms of I/O, e.g. may involve computing a checksum value against large data files.

    Note

    This API is for internal purposes only and might change in incompatible ways in the next release.

    Declaration
    public override void CheckIntegrity()
    Overrides
    FieldsProducer.CheckIntegrity()

    Dispose(bool)

    Disposes all resources used by this object.

    Declaration
    protected override void Dispose(bool disposing)
    Parameters
    Type Name Description
    bool disposing
    Overrides
    FieldsProducer.Dispose(bool)

    GetEnumerator()

    Returns an enumerator that will step through all field names. This will not return null.

    Declaration
    public override IEnumerator<string> GetEnumerator()
    Returns
    Type Description
    IEnumerator<string>
    Overrides
    Fields.GetEnumerator()

    GetTerms(string)

    Get the Terms for this field. This will return null if the field does not exist.

    Declaration
    public override Terms GetTerms(string field)
    Parameters
    Type Name Description
    string field
    Returns
    Type Description
    Terms
    Overrides
    Fields.GetTerms(string)

    RamBytesUsed()

    Returns approximate RAM bytes used.

    Declaration
    public override long RamBytesUsed()
    Returns
    Type Description
    long
    Overrides
    FieldsProducer.RamBytesUsed()

    ReadHeader(IndexInput)

    Reads terms file header.

    Declaration
    protected virtual int ReadHeader(IndexInput input)
    Parameters
    Type Name Description
    IndexInput input
    Returns
    Type Description
    int

    ReadIndexHeader(IndexInput)

    Reads index file header.

    Declaration
    protected virtual int ReadIndexHeader(IndexInput input)
    Parameters
    Type Name Description
    IndexInput input
    Returns
    Type Description
    int

    SeekDir(IndexInput, long)

    Seek input to the directory offset.

    Declaration
    protected virtual void SeekDir(IndexInput input, long dirOffset)
    Parameters
    Type Name Description
    IndexInput input
    long dirOffset

    Implements

    IEnumerable<T>
    IEnumerable
    IDisposable
    Back to top Copyright © 2024 The Apache Software Foundation, Licensed under the Apache License, Version 2.0
    Apache Lucene.Net, Lucene.Net, Apache, the Apache feather logo, and the Apache Lucene.Net project logo are trademarks of The Apache Software Foundation.
    All other marks mentioned may be trademarks or registered trademarks of their respective owners.