Class SimpleTextDocValuesFormat

Plain text doc values format.

FOR RECREATIONAL USE ONLY

The .dat file contains the data. For numbers this is a "fixed-width" file, for example a single byte range:

field myField
  type NUMERIC
  minvalue 0
  pattern 000
005
T
234
T
123
T
...

So a document's value (delta encoded from minvalue) can be retrieved by seeking to startOffset + (1+pattern.length()+2)*docid. The extra 1 is the newline. The extra 2 is another newline and 'T' or 'F': true if the value is real, false if missing.

for bytes this is also a "fixed-width" file, for example:

field myField
  type BINARY
  maxlength 6
  pattern 0
length 6
foobar[space][space]
T
length 3
baz[space][space][space][space][space]
T
...

So a doc's value can be retrieved by seeking to startOffset + (9+pattern.length+maxlength+2)*doc the extra 9 is 2 newlines, plus "length " itself. The extra 2 is another newline and 'T' or 'F': true if the value is real, false if missing.

For sorted bytes this is a fixed-width file, for example:

field myField
  type SORTED
  numvalues 10
  maxLength 8
  pattern 0
  ordpattern 00
length 6
foobar[space][space]
length 3
baz[space][space][space][space][space]
...
03
06
01
10
...

So the "ord section" begins at startOffset + (9+pattern.length+maxlength)*numValues. A document's ord can be retrieved by seeking to "ord section" + (1+ordpattern.length())*docid an ord's value can be retrieved by seeking to startOffset + (9+pattern.length+maxlength)*ord

For sorted set this is a fixed-width file very similar to the SORTED case, for example:

field myField
  type SORTED_SET
  numvalues 10
  maxLength 8
  pattern 0
  ordpattern XXXXX
length 6
foobar[space][space]
length 3
baz[space][space][space][space][space]
...
0,3,5   
1,2

10
...

So the "ord section" begins at startOffset + (9+pattern.length+maxlength)*numValues. A document's ord list can be retrieved by seeking to "ord section" + (1+ordpattern.length())*docid this is a comma-separated list, and its padded with spaces to be fixed width. so trim() and split() it. and beware the empty string! An ord's value can be retrieved by seeking to startOffset + (9+pattern.length+maxlength)*ord

The reader can just scan this file when it opens, skipping over the data blocks and saving the offset/etc for each field.

Note

This API is experimental and might change in incompatible ways in the next release.

Inheritance

object

DocValuesFormat

SimpleTextDocValuesFormat

Inherited Members

DocValuesFormat.SetDocValuesFormatFactory(IDocValuesFormatFactory)

DocValuesFormat.GetDocValuesFormatFactory()

DocValuesFormat.Name

DocValuesFormat.ToString()

DocValuesFormat.ForName(string)

DocValuesFormat.AvailableDocValuesFormats

object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

Namespace: Lucene.Net.Codecs.SimpleText

Assembly: Lucene.Net.Codecs.dll

Syntax

[DocValuesFormatName("SimpleText")]
public class SimpleTextDocValuesFormat : DocValuesFormat

Constructors

SimpleTextDocValuesFormat()

Plain text doc values format.

FOR RECREATIONAL USE ONLY

The .dat file contains the data. For numbers this is a "fixed-width" file, for example a single byte range:

field myField
  type NUMERIC
  minvalue 0
  pattern 000
005
T
234
T
123
T
...

for bytes this is also a "fixed-width" file, for example:

field myField
  type BINARY
  maxlength 6
  pattern 0
length 6
foobar[space][space]
T
length 3
baz[space][space][space][space][space]
T
...

For sorted bytes this is a fixed-width file, for example:

field myField
  type SORTED
  numvalues 10
  maxLength 8
  pattern 0
  ordpattern 00
length 6
foobar[space][space]
length 3
baz[space][space][space][space][space]
...
03
06
01
10
...

For sorted set this is a fixed-width file very similar to the SORTED case, for example:

field myField
  type SORTED_SET
  numvalues 10
  maxLength 8
  pattern 0
  ordpattern XXXXX
length 6
foobar[space][space]
length 3
baz[space][space][space][space][space]
...
0,3,5   
1,2

10
...

The reader can just scan this file when it opens, skipping over the data blocks and saving the offset/etc for each field.

Note

This API is experimental and might change in incompatible ways in the next release.

Declaration

public SimpleTextDocValuesFormat()

Methods

FieldsConsumer(SegmentWriteState)

Returns a Lucene.Net.Codecs.DocValuesConsumer to write docvalues to the index.

Declaration

public override DocValuesConsumer FieldsConsumer(SegmentWriteState state)

Parameters

Type	Name	Description
SegmentWriteState	state

Returns

Type	Description
DocValuesConsumer

Overrides

Lucene.Net.Codecs.DocValuesFormat.FieldsConsumer(Lucene.Net.Index.SegmentWriteState)

FieldsProducer(SegmentReadState)

Returns a Lucene.Net.Codecs.DocValuesProducer to read docvalues from the index.

NOTE: by the time this call returns, it must hold open any files it will need to use; else, those files may be deleted. Additionally, required files may be deleted during the execution of this call before there is a chance to open them. Under these circumstances an IOException should be thrown by the implementation. IOExceptions are expected and will automatically cause a retry of the segment opening logic with the newly revised segments.

Declaration

public override DocValuesProducer FieldsProducer(SegmentReadState state)

Parameters

Type	Name	Description
SegmentReadState	state

Returns

Type	Description
DocValuesProducer

Overrides

Lucene.Net.Codecs.DocValuesFormat.FieldsProducer(Lucene.Net.Index.SegmentReadState)