Class Lucene40DocValuesFormat
Lucene 4.0 DocValues format.
Files:
.dv.cfs
: compound container (CompoundFileDirectory).dv.cfe
: compound entries (CompoundFileDirectory)
<segment><fieldNumber>.dat
: data values<segment><fieldNumber>.idx
: index into the .dat for DEREF types
There are several many types of DocValues with different encodings.
From the perspective of filenames, all types store their values in .dat
entries within the compound file. In the case of dereferenced/sorted types, the .dat
actually contains only the unique values, and an additional .idx
file contains
pointers to these unique values.
- Lucene.Net.Codecs.Lucene40.LegacyDocValuesType.VAR_INTS .dat --> Header, PackedType, MinValue, DefaultValue, PackedStream
- Lucene.Net.Codecs.Lucene40.LegacyDocValuesType.FIXED_INTS_8 .dat --> Header, ValueSize, Byte (WriteByte(Byte)) maxdoc
- Lucene.Net.Codecs.Lucene40.LegacyDocValuesType.FIXED_INTS_16 .dat --> Header, ValueSize, Short (WriteInt16(Int16)) maxdoc
- Lucene.Net.Codecs.Lucene40.LegacyDocValuesType.FIXED_INTS_32 .dat --> Header, ValueSize, Int32 (WriteInt32(Int32)) maxdoc
- Lucene.Net.Codecs.Lucene40.LegacyDocValuesType.FIXED_INTS_64 .dat --> Header, ValueSize, Int64 (WriteInt64(Int64)) maxdoc
- Lucene.Net.Codecs.Lucene40.LegacyDocValuesType.FLOAT_32 .dat --> Header, ValueSize, Float32maxdoc
- Lucene.Net.Codecs.Lucene40.LegacyDocValuesType.FLOAT_64 .dat --> Header, ValueSize, Float64maxdoc
- Lucene.Net.Codecs.Lucene40.LegacyDocValuesType.BYTES_FIXED_STRAIGHT .dat --> Header, ValueSize, (Byte (WriteByte(Byte)) * ValueSize)maxdoc
- Lucene.Net.Codecs.Lucene40.LegacyDocValuesType.BYTES_VAR_STRAIGHT .idx --> Header, TotalBytes, Addresses
- Lucene.Net.Codecs.Lucene40.LegacyDocValuesType.BYTES_VAR_STRAIGHT .dat --> Header, (Byte (WriteByte(Byte)) * variable ValueSize)maxdoc
- Lucene.Net.Codecs.Lucene40.LegacyDocValuesType.BYTES_FIXED_DEREF .idx --> Header, NumValues, Addresses
- Lucene.Net.Codecs.Lucene40.LegacyDocValuesType.BYTES_FIXED_DEREF .dat --> Header, ValueSize, (Byte (WriteByte(Byte)) * ValueSize)NumValues
- Lucene.Net.Codecs.Lucene40.LegacyDocValuesType.BYTES_VAR_DEREF .idx --> Header, TotalVarBytes, Addresses
- Lucene.Net.Codecs.Lucene40.LegacyDocValuesType.BYTES_VAR_DEREF .dat --> Header, (LengthPrefix + Byte (WriteByte(Byte)) * variable ValueSize)NumValues
- Lucene.Net.Codecs.Lucene40.LegacyDocValuesType.BYTES_FIXED_SORTED .idx --> Header, NumValues, Ordinals
- Lucene.Net.Codecs.Lucene40.LegacyDocValuesType.BYTES_FIXED_SORTED .dat --> Header, ValueSize, (Byte (WriteByte(Byte)) * ValueSize)NumValues
- Lucene.Net.Codecs.Lucene40.LegacyDocValuesType.BYTES_VAR_SORTED .idx --> Header, TotalVarBytes, Addresses, Ordinals
- Lucene.Net.Codecs.Lucene40.LegacyDocValuesType.BYTES_VAR_SORTED .dat --> Header, (Byte (WriteByte(Byte)) * variable ValueSize)NumValues
- Header --> CodecHeader (WriteHeader(DataOutput, String, Int32))
- PackedType --> Byte (WriteByte(Byte))
- MaxAddress, MinValue, DefaultValue --> Int64 (WriteInt64(Int64))
- PackedStream, Addresses, Ordinals --> PackedInt32s
- ValueSize, NumValues --> Int32 (WriteInt32(Int32))
- Float32 --> 32-bit float encoded with SingleToRawInt32Bits(Single) then written as Int32 (WriteInt32(Int32))
- Float64 --> 64-bit float encoded with DoubleToRawInt64Bits(Double) then written as Int64 (WriteInt64(Int64))
- TotalBytes --> VLong (WriteVInt64(Int64))
- TotalVarBytes --> Int64 (WriteInt64(Int64))
- LengthPrefix --> Length of the data value as VInt (WriteVInt32(Int32)) (maximum of 2 bytes)
- PackedType is a 0 when compressed, 1 when the stream is written as 64-bit integers.
- Addresses stores pointers to the actual byte location (indexed by docid). In the VAR_STRAIGHT case, each entry can have a different length, so to determine the length, docid+1 is retrieved. A sentinel address is written at the end for the VAR_STRAIGHT case, so the Addresses stream contains maxdoc+1 indices. For the deduplicated VAR_DEREF case, each length is encoded as a prefix to the data itself as a VInt (WriteVInt32(Int32)) (maximum of 2 bytes).
- Ordinals stores the term ID in sorted order (indexed by docid). In the FIXED_SORTED case,
the address into the .dat can be computed from the ordinal as
Header+ValueSize+(ordinal*ValueSize)
because the byte length is fixed. In the VAR_SORTED case, there is double indirection (docid -> ordinal -> address), but an additional sentinel ordinal+address is always written (so there are NumValues+1 ordinals). To determine the length, ord+1's address is looked up as well. - Lucene.Net.Codecs.Lucene40.LegacyDocValuesType.BYTES_VAR_STRAIGHT in contrast to other straight
variants uses a
.idx
file to improve lookup perfromance. In contrast to Lucene.Net.Codecs.Lucene40.LegacyDocValuesType.BYTES_VAR_DEREF it doesn't apply deduplication of the document values.
Limitations:
- Binary doc values can be at most MAX_BINARY_FIELD_LENGTH in length.
Inherited Members
System.Object.Equals(System.Object)
System.Object.Equals(System.Object, System.Object)
System.Object.GetHashCode()
System.Object.GetType()
System.Object.MemberwiseClone()
System.Object.ReferenceEquals(System.Object, System.Object)
Namespace: Lucene.Net.Codecs.Lucene40
Assembly: Lucene.Net.dll
Syntax
[Obsolete("Only for reading old 4.0 and 4.1 segments")]
[DocValuesFormatName("Lucene40")]
public class Lucene40DocValuesFormat : DocValuesFormat
Constructors
| Improve this Doc View SourceLucene40DocValuesFormat()
Sole constructor.
Declaration
public Lucene40DocValuesFormat()
Fields
| Improve this Doc View SourceMAX_BINARY_FIELD_LENGTH
Maximum length for each binary doc values field.
Declaration
public static readonly int MAX_BINARY_FIELD_LENGTH
Field Value
Type | Description |
---|---|
System.Int32 |
Methods
| Improve this Doc View SourceFieldsConsumer(SegmentWriteState)
Declaration
public override DocValuesConsumer FieldsConsumer(SegmentWriteState state)
Parameters
Type | Name | Description |
---|---|---|
SegmentWriteState | state |
Returns
Type | Description |
---|---|
DocValuesConsumer |
Overrides
| Improve this Doc View SourceFieldsProducer(SegmentReadState)
Declaration
public override DocValuesProducer FieldsProducer(SegmentReadState state)
Parameters
Type | Name | Description |
---|---|---|
SegmentReadState | state |
Returns
Type | Description |
---|---|
DocValuesProducer |