Class SimpleTextDocValuesFormat

Plain text doc values format.

FOR RECREATIONAL USE ONLY

The .dat file contains the data. For numbers this is a "fixed-width" file, for example a single byte range:

 field myField
   type NUMERIC
   minvalue 0
   pattern 000
 005
 T
 234
 T
 123
 T
 ...

So a document's value (delta encoded from minvalue) can be retrieved by seeking to startOffset + (1+pattern.length()+2)*docid. The extra 1 is the newline. The extra 2 is another newline and 'T' or 'F': true if the value is real, false if missing.

for bytes this is also a "fixed-width" file, for example:

 field myField
   type BINARY
   maxlength 6
   pattern 0
 length 6
 foobar[space][space]
 T
 length 3
 baz[space][space][space][space][space]
 T
 ...

So a doc's value can be retrieved by seeking to startOffset + (9+pattern.length+maxlength+2)*doc the extra 9 is 2 newlines, plus "length " itself. The extra 2 is another newline and 'T' or 'F': true if the value is real, false if missing.

For sorted bytes this is a fixed-width file, for example:

 field myField
   type SORTED
   numvalues 10
   maxLength 8
   pattern 0
   ordpattern 00
 length 6
 foobar[space][space]
 length 3
 baz[space][space][space][space][space]
 ...
 03
 06
 01
 10
 ...

So the "ord section" begins at startOffset + (9+pattern.length+maxlength)numValues. A document's ord can be retrieved by seeking to "ord section" + (1+ordpattern.length())docid an ord's value can be retrieved by seeking to startOffset + (9+pattern.length+maxlength)*ord

For sorted set this is a fixed-width file very similar to the SORTED case, for example:

 field myField
   type SORTED_SET
   numvalues 10
   maxLength 8
   pattern 0
   ordpattern XXXXX
 length 6
 foobar[space][space]
 length 3
 baz[space][space][space][space][space]
 ...
 0,3,5   
 1,2

 10
 ...

So the "ord section" begins at startOffset + (9+pattern.length+maxlength)numValues. A document's ord list can be retrieved by seeking to "ord section" + (1+ordpattern.length())docid this is a comma-separated list, and its padded with spaces to be fixed width. so trim() and split() it. and beware the empty string! An ord's value can be retrieved by seeking to startOffset + (9+pattern.length+maxlength)*ord

The reader can just scan this file when it opens, skipping over the data blocks and saving the offset/etc for each field.

This is a Lucene.NET EXPERIMENTAL API, use at your own risk

Inheritance

System.Object

Lucene.Net.Codecs.DocValuesFormat

SimpleTextDocValuesFormat

Inherited Members

Lucene.Net.Codecs.DocValuesFormat.SetDocValuesFormatFactory(Lucene.Net.Codecs.IDocValuesFormatFactory)

Lucene.Net.Codecs.DocValuesFormat.GetDocValuesFormatFactory()

Lucene.Net.Codecs.DocValuesFormat.Name

Lucene.Net.Codecs.DocValuesFormat.ToString()

DocValuesFormat.ForName(String)

Lucene.Net.Codecs.DocValuesFormat.AvailableDocValuesFormats

System.Object.Equals(System.Object)

System.Object.Equals(System.Object, System.Object)

System.Object.GetHashCode()

System.Object.GetType()

System.Object.MemberwiseClone()

System.Object.ReferenceEquals(System.Object, System.Object)

Namespace: Lucene.Net.Codecs.SimpleText

Assembly: Lucene.Net.Codecs.dll

Syntax

[DocValuesFormatName("SimpleText")]
public class SimpleTextDocValuesFormat : DocValuesFormat

Constructors

| Improve this Doc View Source

SimpleTextDocValuesFormat()

Declaration

public SimpleTextDocValuesFormat()

Methods

| Improve this Doc View Source

FieldsConsumer(SegmentWriteState)

Declaration

public override DocValuesConsumer FieldsConsumer(SegmentWriteState state)

Parameters

Type	Name	Description
Lucene.Net.Index.SegmentWriteState	state

Returns

Type	Description
Lucene.Net.Codecs.DocValuesConsumer

Overrides

Lucene.Net.Codecs.DocValuesFormat.FieldsConsumer(Lucene.Net.Index.SegmentWriteState)

| Improve this Doc View Source

FieldsProducer(SegmentReadState)

Declaration

public override DocValuesProducer FieldsProducer(SegmentReadState state)

Parameters

Type	Name	Description
Lucene.Net.Index.SegmentReadState	state

Returns

Type	Description
Lucene.Net.Codecs.DocValuesProducer

Overrides

Lucene.Net.Codecs.DocValuesFormat.FieldsProducer(Lucene.Net.Index.SegmentReadState)