Class FSTTermsWriter
FST-based term dict, using metadata as FST output.
The FST directly holds the mapping between <term, metadata>.
Term metadata consists of three parts:
- term statistics: docFreq, totalTermFreq;
- monotonic long[], e.g. the pointer to the postings list for that term;
- generic byte[], e.g. other information need by postings reader.
File:
.tst
: Term Dictionary
Term Dictionary
The .tst contains a list of FSTs, one for each field. The FST maps a term to its corresponding statistics (e.g. docfreq) and metadata (e.g. information for postings list reader like file pointer to postings list).
Typically the metadata is separated into two parts:
- Monotonical long array: Some metadata will always be ascending in order with the corresponding term. This part is used by FST to share outputs between arcs.
- Generic byte array: Used to store non-monotonic metadata.
File format:
- TermsDict(.tst) --> Header, PostingsHeader, FieldSummary, DirOffset
- FieldSummary --> NumFields, <FieldNumber, NumTerms, SumTotalTermFreq?, SumDocFreq, DocCount, LongsSize, TermFST >NumFields
- TermFST TermData
- TermData --> Flag, BytesSize?, LongDeltaLongsSize?, ByteBytesSize?, < DocFreq[Same?], (TotalTermFreq-DocFreq) > ?
- Header --> CodecHeader (WriteHeader(DataOutput, String, Int32))
- DirOffset --> Uint64 (WriteInt64(Int64))
- DocFreq, LongsSize, BytesSize, NumFields, FieldNumber, DocCount --> VInt (WriteVInt32(Int32))
- TotalTermFreq, NumTerms, SumTotalTermFreq, SumDocFreq, LongDelta --> VLong (WriteVInt64(Int64))
Notes:
- The format of PostingsHeader and generic meta bytes are customized by the specific postings implementation: they contain arbitrary per-file data (such as parameters or versioning information), and per-term data (non-monotonic ones like pulsed postings data).
- The format of TermData is determined by FST, typically monotonic metadata will be dense around shallow arcs, while in deeper arcs only generic bytes and term statistics exist.
- The byte Flag is used to indicate which part of metadata exists on current arc. Specially the monotonic part is omitted when it is an array of 0s.
- Since LongsSize is per-field fixed, it is only written once in field summary.
This is a Lucene.NET EXPERIMENTAL API, use at your own risk
Inheritance
System.Object
Lucene.Net.Codecs.FieldsConsumer
FSTTermsWriter
Implements
System.IDisposable
Inherited Members
Lucene.Net.Codecs.FieldsConsumer.Dispose()
Lucene.Net.Codecs.FieldsConsumer.Merge(Lucene.Net.Index.MergeState, Lucene.Net.Index.Fields)
System.Object.Equals(System.Object)
System.Object.Equals(System.Object, System.Object)
System.Object.GetHashCode()
System.Object.GetType()
System.Object.MemberwiseClone()
System.Object.ReferenceEquals(System.Object, System.Object)
System.Object.ToString()
Namespace: Lucene.Net.Codecs.Memory
Assembly: Lucene.Net.Codecs.dll
Syntax
public class FSTTermsWriter : FieldsConsumer, IDisposable
Constructors
| Improve this Doc View SourceFSTTermsWriter(SegmentWriteState, PostingsWriterBase)
Declaration
public FSTTermsWriter(SegmentWriteState state, PostingsWriterBase postingsWriter)
Parameters
Type | Name | Description |
---|---|---|
Lucene.Net.Index.SegmentWriteState | state | |
Lucene.Net.Codecs.PostingsWriterBase | postingsWriter |
Fields
| Improve this Doc View SourceTERMS_VERSION_CHECKSUM
Declaration
public const int TERMS_VERSION_CHECKSUM = 1
Field Value
Type | Description |
---|---|
System.Int32 |
TERMS_VERSION_CURRENT
Declaration
public const int TERMS_VERSION_CURRENT = 1
Field Value
Type | Description |
---|---|
System.Int32 |
TERMS_VERSION_START
Declaration
public const int TERMS_VERSION_START = 0
Field Value
Type | Description |
---|---|
System.Int32 |
Methods
| Improve this Doc View SourceAddField(FieldInfo)
Declaration
public override TermsConsumer AddField(FieldInfo field)
Parameters
Type | Name | Description |
---|---|---|
Lucene.Net.Index.FieldInfo | field |
Returns
Type | Description |
---|---|
Lucene.Net.Codecs.TermsConsumer |
Overrides
Lucene.Net.Codecs.FieldsConsumer.AddField(Lucene.Net.Index.FieldInfo)
|
Improve this Doc
View Source
Dispose(Boolean)
Declaration
protected override void Dispose(bool disposing)
Parameters
Type | Name | Description |
---|---|---|
System.Boolean | disposing |
Overrides
Implements
System.IDisposable