• API

    Show / Hide Table of Contents

    Class FSTTermsWriter

    FST-based term dict, using metadata as FST output.

    The FST directly holds the mapping between <term, metadata>.

    Term metadata consists of three parts:

    1. term statistics: docFreq, totalTermFreq;
    2. monotonic long[], e.g. the pointer to the postings list for that term;
    3. generic byte[], e.g. other information need by postings reader.

    File:

    • .tst: Term Dictionary

    Term Dictionary

    The .tst contains a list of FSTs, one for each field. The FST maps a term to its corresponding statistics (e.g. docfreq) and metadata (e.g. information for postings list reader like file pointer to postings list).

    Typically the metadata is separated into two parts:

    • Monotonical long array: Some metadata will always be ascending in order with the corresponding term. This part is used by FST to share outputs between arcs.
    • Generic byte array: Used to store non-monotonic metadata.

    File format:

    • TermsDict(.tst) --> Header, PostingsHeader, FieldSummary, DirOffset
    • FieldSummary --> NumFields, <FieldNumber, NumTerms, SumTotalTermFreq?, SumDocFreq, DocCount, LongsSize, TermFST >NumFields
    • TermFST TermData
    • TermData --> Flag, BytesSize?, LongDeltaLongsSize?, ByteBytesSize?, < DocFreq[Same?], (TotalTermFreq-DocFreq) > ?
    • Header --> CodecHeader (WriteHeader(DataOutput, String, Int32))
    • DirOffset --> Uint64 (WriteInt64(Int64))
    • DocFreq, LongsSize, BytesSize, NumFields, FieldNumber, DocCount --> VInt (WriteVInt32(Int32))
    • TotalTermFreq, NumTerms, SumTotalTermFreq, SumDocFreq, LongDelta --> VLong (WriteVInt64(Int64))

    Notes:

    • The format of PostingsHeader and generic meta bytes are customized by the specific postings implementation: they contain arbitrary per-file data (such as parameters or versioning information), and per-term data (non-monotonic ones like pulsed postings data).
    • The format of TermData is determined by FST, typically monotonic metadata will be dense around shallow arcs, while in deeper arcs only generic bytes and term statistics exist.
    • The byte Flag is used to indicate which part of metadata exists on current arc. Specially the monotonic part is omitted when it is an array of 0s.
    • Since LongsSize is per-field fixed, it is only written once in field summary.

    This is a Lucene.NET EXPERIMENTAL API, use at your own risk
    Inheritance
    System.Object
    Lucene.Net.Codecs.FieldsConsumer
    FSTTermsWriter
    Implements
    System.IDisposable
    Inherited Members
    Lucene.Net.Codecs.FieldsConsumer.Dispose()
    Lucene.Net.Codecs.FieldsConsumer.Merge(Lucene.Net.Index.MergeState, Lucene.Net.Index.Fields)
    System.Object.Equals(System.Object)
    System.Object.Equals(System.Object, System.Object)
    System.Object.GetHashCode()
    System.Object.GetType()
    System.Object.MemberwiseClone()
    System.Object.ReferenceEquals(System.Object, System.Object)
    System.Object.ToString()
    Namespace: Lucene.Net.Codecs.Memory
    Assembly: Lucene.Net.Codecs.dll
    Syntax
    public class FSTTermsWriter : FieldsConsumer, IDisposable

    Constructors

    | Improve this Doc View Source

    FSTTermsWriter(SegmentWriteState, PostingsWriterBase)

    Declaration
    public FSTTermsWriter(SegmentWriteState state, PostingsWriterBase postingsWriter)
    Parameters
    Type Name Description
    Lucene.Net.Index.SegmentWriteState state
    Lucene.Net.Codecs.PostingsWriterBase postingsWriter

    Fields

    | Improve this Doc View Source

    TERMS_VERSION_CHECKSUM

    Declaration
    public const int TERMS_VERSION_CHECKSUM = 1
    Field Value
    Type Description
    System.Int32
    | Improve this Doc View Source

    TERMS_VERSION_CURRENT

    Declaration
    public const int TERMS_VERSION_CURRENT = 1
    Field Value
    Type Description
    System.Int32
    | Improve this Doc View Source

    TERMS_VERSION_START

    Declaration
    public const int TERMS_VERSION_START = 0
    Field Value
    Type Description
    System.Int32

    Methods

    | Improve this Doc View Source

    AddField(FieldInfo)

    Declaration
    public override TermsConsumer AddField(FieldInfo field)
    Parameters
    Type Name Description
    Lucene.Net.Index.FieldInfo field
    Returns
    Type Description
    Lucene.Net.Codecs.TermsConsumer
    Overrides
    Lucene.Net.Codecs.FieldsConsumer.AddField(Lucene.Net.Index.FieldInfo)
    | Improve this Doc View Source

    Dispose(Boolean)

    Declaration
    protected override void Dispose(bool disposing)
    Parameters
    Type Name Description
    System.Boolean disposing
    Overrides
    FieldsConsumer.Dispose(Boolean)

    Implements

    System.IDisposable
    • Improve this Doc
    • View Source
    Back to top Copyright © 2020 Licensed to the Apache Software Foundation (ASF)