Show / Hide Table of Contents

    Class FSTTermsWriter

    FST-based term dict, using metadata as FST output.

    The FST directly holds the mapping between <term, metadata>.

    Term metadata consists of three parts:

    1. term statistics: docFreq, totalTermFreq;
    2. monotonic long[], e.g. the pointer to the postings list for that term;
    3. generic byte[], e.g. other information need by postings reader.

    File:

    • .tst: Term Dictionary

    Term Dictionary

    The .tst contains a list of FSTs, one for each field. The FST maps a term to its corresponding statistics (e.g. docfreq) and metadata (e.g. information for postings list reader like file pointer to postings list).

    Typically the metadata is separated into two parts:

    • Monotonical long array: Some metadata will always be ascending in order with the corresponding term. This part is used by FST to share outputs between arcs.
    • Generic byte array: Used to store non-monotonic metadata.

    File format:

    • TermsDict(.tst) --> Header, PostingsHeader, FieldSummary, DirOffset
    • FieldSummary --> NumFields, <FieldNumber, NumTerms, SumTotalTermFreq?, SumDocFreq, DocCount, LongsSize, TermFST >NumFields
    • TermFST TermData
    • TermData --> Flag, BytesSize?, LongDeltaLongsSize?, ByteBytesSize?, < DocFreq[Same?], (TotalTermFreq-DocFreq) > ?
    • Header --> CodecHeader ()
    • DirOffset --> Uint64 ()
    • DocFreq, LongsSize, BytesSize, NumFields, FieldNumber, DocCount --> VInt ()
    • TotalTermFreq, NumTerms, SumTotalTermFreq, SumDocFreq, LongDelta --> VLong ()

    Notes:

    • The format of PostingsHeader and generic meta bytes are customized by the specific postings implementation: they contain arbitrary per-file data (such as parameters or versioning information), and per-term data (non-monotonic ones like pulsed postings data).
    • The format of TermData is determined by FST, typically monotonic metadata will be dense around shallow arcs, while in deeper arcs only generic bytes and term statistics exist.
    • The byte Flag is used to indicate which part of metadata exists on current arc. Specially the monotonic part is omitted when it is an array of 0s.
    • Since LongsSize is per-field fixed, it is only written once in field summary.

    This is a Lucene.NET EXPERIMENTAL API, use at your own risk
    Inheritance
    System.Object
    FieldsConsumer
    FSTTermsWriter
    Inherited Members
    FieldsConsumer.Dispose()
    FieldsConsumer.Merge(MergeState, Fields)
    Namespace: Lucene.Net.Codecs.Memory
    Assembly: Lucene.Net.Codecs.dll
    Syntax
    public class FSTTermsWriter : FieldsConsumer

    Constructors

    | Improve this Doc View Source

    FSTTermsWriter(SegmentWriteState, PostingsWriterBase)

    Declaration
    public FSTTermsWriter(SegmentWriteState state, PostingsWriterBase postingsWriter)
    Parameters
    Type Name Description
    SegmentWriteState state
    PostingsWriterBase postingsWriter

    Fields

    | Improve this Doc View Source

    TERMS_VERSION_CHECKSUM

    Declaration
    public const int TERMS_VERSION_CHECKSUM = null
    Field Value
    Type Description
    System.Int32
    | Improve this Doc View Source

    TERMS_VERSION_CURRENT

    Declaration
    public const int TERMS_VERSION_CURRENT = null
    Field Value
    Type Description
    System.Int32
    | Improve this Doc View Source

    TERMS_VERSION_START

    Declaration
    public const int TERMS_VERSION_START = null
    Field Value
    Type Description
    System.Int32

    Methods

    | Improve this Doc View Source

    AddField(FieldInfo)

    Declaration
    public override TermsConsumer AddField(FieldInfo field)
    Parameters
    Type Name Description
    FieldInfo field
    Returns
    Type Description
    TermsConsumer
    Overrides
    FieldsConsumer.AddField(FieldInfo)
    | Improve this Doc View Source

    Dispose(Boolean)

    Declaration
    protected override void Dispose(bool disposing)
    Parameters
    Type Name Description
    System.Boolean disposing
    • Improve this Doc
    • View Source
    Back to top Copyright © 2020 Licensed to the Apache Software Foundation (ASF)