Class FSTTermsWriter
FST-based term dict, using metadata as FST output.
The FST directly holds the mapping between <term, metadata>.
Term metadata consists of three parts:
- term statistics: docFreq, totalTermFreq;
- monotonic long[], e.g. the pointer to the postings list for that term;
- generic byte[], e.g. other information need by postings reader.
File:
.tst
: Term Dictionary
Term Dictionary
The .tst contains a list of FSTs, one for each field. The FST maps a term to its corresponding statistics (e.g. docfreq) and metadata (e.g. information for postings list reader like file pointer to postings list).
Typically the metadata is separated into two parts:
- Monotonical long array: Some metadata will always be ascending in order with the corresponding term. This part is used by FST to share outputs between arcs.
- Generic byte array: Used to store non-monotonic metadata.
File format:
- TermsDict(.tst) --> Header, PostingsHeader, FieldSummary, DirOffset
- FieldSummary --> NumFields, <FieldNumber, NumTerms, SumTotalTermFreq?, SumDocFreq, DocCount, LongsSize, TermFST >NumFields
- TermFST TermData
- TermData --> Flag, BytesSize?, LongDeltaLongsSize?, ByteBytesSize?, < DocFreq[Same?], (TotalTermFreq-DocFreq) > ?
- Header --> CodecHeader (Lucene.Net.Codecs.CodecUtil.WriteHeader(Lucene.Net.Store.DataOutput,System.String,System.Int32))
- DirOffset --> Uint64 (Lucene.Net.Store.DataOutput.WriteInt64(System.Int64))
- DocFreq, LongsSize, BytesSize, NumFields, FieldNumber, DocCount --> VInt (Lucene.Net.Store.DataOutput.WriteVInt32(System.Int32))
- TotalTermFreq, NumTerms, SumTotalTermFreq, SumDocFreq, LongDelta --> VLong (Lucene.Net.Store.DataOutput.WriteVInt64(System.Int64))
Notes:
- The format of PostingsHeader and generic meta bytes are customized by the specific postings implementation: they contain arbitrary per-file data (such as parameters or versioning information), and per-term data (non-monotonic ones like pulsed postings data).
- The format of TermData is determined by FST, typically monotonic metadata will be dense around shallow arcs, while in deeper arcs only generic bytes and term statistics exist.
- The byte Flag is used to indicate which part of metadata exists on current arc. Specially the monotonic part is omitted when it is an array of 0s.
- Since LongsSize is per-field fixed, it is only written once in field summary.
Note
This API is experimental and might change in incompatible ways in the next release.
Inheritance
System.Object
Lucene.Net.Codecs.FieldsConsumer
FSTTermsWriter
Implements
System.IDisposable
Inherited Members
Lucene.Net.Codecs.FieldsConsumer.Dispose()
Lucene.Net.Codecs.FieldsConsumer.Merge(Lucene.Net.Index.MergeState, Lucene.Net.Index.Fields)
System.Object.Equals(System.Object)
System.Object.Equals(System.Object, System.Object)
System.Object.GetHashCode()
System.Object.GetType()
System.Object.MemberwiseClone()
System.Object.ReferenceEquals(System.Object, System.Object)
System.Object.ToString()
Namespace: Lucene.Net.Codecs.Memory
Assembly: Lucene.Net.Codecs.dll
Syntax
public class FSTTermsWriter : FieldsConsumer, IDisposable
Constructors
| Improve this Doc View SourceFSTTermsWriter(SegmentWriteState, PostingsWriterBase)
Declaration
public FSTTermsWriter(SegmentWriteState state, PostingsWriterBase postingsWriter)
Parameters
Type | Name | Description |
---|---|---|
Lucene.Net.Index.SegmentWriteState | state | |
Lucene.Net.Codecs.PostingsWriterBase | postingsWriter |
Fields
| Improve this Doc View SourceTERMS_VERSION_CHECKSUM
Declaration
public const int TERMS_VERSION_CHECKSUM = 1
Field Value
Type | Description |
---|---|
System.Int32 |
TERMS_VERSION_CURRENT
Declaration
public const int TERMS_VERSION_CURRENT = 1
Field Value
Type | Description |
---|---|
System.Int32 |
TERMS_VERSION_START
Declaration
public const int TERMS_VERSION_START = 0
Field Value
Type | Description |
---|---|
System.Int32 |
Methods
| Improve this Doc View SourceAddField(FieldInfo)
Declaration
public override TermsConsumer AddField(FieldInfo field)
Parameters
Type | Name | Description |
---|---|---|
Lucene.Net.Index.FieldInfo | field |
Returns
Type | Description |
---|---|
Lucene.Net.Codecs.TermsConsumer |
Overrides
Lucene.Net.Codecs.FieldsConsumer.AddField(Lucene.Net.Index.FieldInfo)
|
Improve this Doc
View Source
Dispose(Boolean)
Declaration
protected override void Dispose(bool disposing)
Parameters
Type | Name | Description |
---|---|---|
System.Boolean | disposing |
Overrides
Lucene.Net.Codecs.FieldsConsumer.Dispose(System.Boolean)
Implements
System.IDisposable