Class Lucene40DocValuesFormat
Lucene 4.0 DocValues format.
Files:
.dv.cfs
: compound container (CompoundFile )Directory .dv.cfe
: compound entries (CompoundFile )Directory
<segment><fieldNumber>.dat
: data values<segment><fieldNumber>.idx
: index into the .dat for DEREF types
There are several many types of Doc.dat
entries within the compound file. In the case of dereferenced/sorted types, the .dat
actually contains only the unique values, and an additional .idx
file contains
pointers to these unique values.
- Lucene.
Net. .dat --> Header, PackedType, MinValue, DefaultValue, PackedStreamCodecs. Lucene40. Legacy Doc Values Type. VAR_INTS - Lucene.
Net. .dat --> Header, ValueSize, Byte (WriteCodecs. Lucene40. Legacy Doc Values Type. FIXED_INTS_8 Byte(Byte) ) maxdoc - Lucene.
Net. .dat --> Header, ValueSize, Short (WriteCodecs. Lucene40. Legacy Doc Values Type. FIXED_INTS_16 Int16(Int16) ) maxdoc - Lucene.
Net. .dat --> Header, ValueSize, Int32 (WriteCodecs. Lucene40. Legacy Doc Values Type. FIXED_INTS_32 Int32(Int32) ) maxdoc - Lucene.
Net. .dat --> Header, ValueSize, Int64 (WriteCodecs. Lucene40. Legacy Doc Values Type. FIXED_INTS_64 Int64(Int64) ) maxdoc - Lucene.
Net. .dat --> Header, ValueSize, Float32maxdocCodecs. Lucene40. Legacy Doc Values Type. FLOAT_32 - Lucene.
Net. .dat --> Header, ValueSize, Float64maxdocCodecs. Lucene40. Legacy Doc Values Type. FLOAT_64 - Lucene.
Net. .dat --> Header, ValueSize, (Byte (WriteCodecs. Lucene40. Legacy Doc Values Type. BYTES_FIXED_STRAIGHT Byte(Byte) ) * ValueSize)maxdoc - Lucene.
Net. .idx --> Header, TotalBytes, AddressesCodecs. Lucene40. Legacy Doc Values Type. BYTES_VAR_STRAIGHT - Lucene.
Net. .dat --> Header, (Byte (WriteCodecs. Lucene40. Legacy Doc Values Type. BYTES_VAR_STRAIGHT Byte(Byte) ) * variable ValueSize)maxdoc - Lucene.
Net. .idx --> Header, NumValues, AddressesCodecs. Lucene40. Legacy Doc Values Type. BYTES_FIXED_DEREF - Lucene.
Net. .dat --> Header, ValueSize, (Byte (WriteCodecs. Lucene40. Legacy Doc Values Type. BYTES_FIXED_DEREF Byte(Byte) ) * ValueSize)NumValues - Lucene.
Net. .idx --> Header, TotalVarBytes, AddressesCodecs. Lucene40. Legacy Doc Values Type. BYTES_VAR_DEREF - Lucene.
Net. .dat --> Header, (LengthPrefix + Byte (WriteCodecs. Lucene40. Legacy Doc Values Type. BYTES_VAR_DEREF Byte(Byte) ) * variable ValueSize)NumValues - Lucene.
Net. .idx --> Header, NumValues, OrdinalsCodecs. Lucene40. Legacy Doc Values Type. BYTES_FIXED_SORTED - Lucene.
Net. .dat --> Header, ValueSize, (Byte (WriteCodecs. Lucene40. Legacy Doc Values Type. BYTES_FIXED_SORTED Byte(Byte) ) * ValueSize)NumValues - Lucene.
Net. .idx --> Header, TotalVarBytes, Addresses, OrdinalsCodecs. Lucene40. Legacy Doc Values Type. BYTES_VAR_SORTED - Lucene.
Net. .dat --> Header, (Byte (WriteCodecs. Lucene40. Legacy Doc Values Type. BYTES_VAR_SORTED Byte(Byte) ) * variable ValueSize)NumValues
- Header --> CodecHeader (Write
Header(Data )Output, String, Int32) - PackedType --> Byte (Write
Byte(Byte) ) - MaxAddress, MinValue, DefaultValue --> Int64 (Write
Int64(Int64) ) - PackedStream, Addresses, Ordinals --> Packed
Int32s - ValueSize, NumValues --> Int32 (Write
Int32(Int32) ) - Float32 --> 32-bit float encoded with
then written as Int32 (Write Int32(Int32) ) - Float64 --> 64-bit float encoded with
then written as Int64 (Write Int64(Int64) ) - TotalBytes --> VLong (Write
VInt64(Int64) ) - TotalVarBytes --> Int64 (Write
Int64(Int64) ) - LengthPrefix --> Length of the data value as VInt (Write
VInt32(Int32) ) (maximum of 2 bytes)
- PackedType is a 0 when compressed, 1 when the stream is written as 64-bit integers.
- Addresses stores pointers to the actual byte location (indexed by docid). In the VAR_STRAIGHT
case, each entry can have a different length, so to determine the length, docid+1 is
retrieved. A sentinel address is written at the end for the VAR_STRAIGHT case, so the Addresses
stream contains maxdoc+1 indices. For the deduplicated VAR_DEREF case, each length
is encoded as a prefix to the data itself as a VInt (Write
VInt32(Int32) ) (maximum of 2 bytes). - Ordinals stores the term ID in sorted order (indexed by docid). In the FIXED_SORTED case,
the address into the .dat can be computed from the ordinal as
Header+ValueSize+(ordinal*ValueSize)
because the byte length is fixed. In the VAR_SORTED case, there is double indirection (docid -> ordinal -> address), but an additional sentinel ordinal+address is always written (so there are NumValues+1 ordinals). To determine the length, ord+1's address is looked up as well. - Lucene.
Net. in contrast to other straight variants uses aCodecs. Lucene40. Legacy Doc Values Type. BYTES_VAR_STRAIGHT .idx
file to improve lookup perfromance. In contrast to Lucene.Net. it doesn't apply deduplication of the document values.Codecs. Lucene40. Legacy Doc Values Type. BYTES_VAR_DEREF
Limitations:
- Binary doc values can be at most MAX_BINARY_FIELD_LENGTH in length.
Inherited Members
Namespace: Lucene.Net.Codecs.Lucene40
Assembly: Lucene.Net.dll
Syntax
public class Lucene40DocValuesFormat : DocValuesFormat
Constructors
| Improve this Doc View SourceLucene40DocValuesFormat()
Sole constructor.
Declaration
public Lucene40DocValuesFormat()
Fields
| Improve this Doc View SourceMAX_BINARY_FIELD_LENGTH
Maximum length for each binary doc values field.
Declaration
public static readonly int MAX_BINARY_FIELD_LENGTH
Field Value
Type | Description |
---|---|
System. |
Methods
| Improve this Doc View SourceFieldsConsumer(SegmentWriteState)
Declaration
public override DocValuesConsumer FieldsConsumer(SegmentWriteState state)
Parameters
Type | Name | Description |
---|---|---|
Segment |
state |
Returns
Type | Description |
---|---|
Doc |
Overrides
| Improve this Doc View SourceFieldsProducer(SegmentReadState)
Declaration
public override DocValuesProducer FieldsProducer(SegmentReadState state)
Parameters
Type | Name | Description |
---|---|---|
Segment |
state |
Returns
Type | Description |
---|---|
Doc |