Show / Hide Table of Contents

    Class SimpleTextDocValuesFormat

    Plain text doc values format.

    FOR RECREATIONAL USE ONLY

    The .dat file contains the data. For numbers this is a "fixed-width" file, for example a single byte range:

     field myField
       type NUMERIC
       minvalue 0
       pattern 000
     005
     T
     234
     T
     123
     T
     ...

    So a document's value (delta encoded from minvalue) can be retrieved by seeking to startOffset + (1+pattern.length()+2)*docid. The extra 1 is the newline. The extra 2 is another newline and 'T' or 'F': true if the value is real, false if missing.

    for bytes this is also a "fixed-width" file, for example:

     field myField
       type BINARY
       maxlength 6
       pattern 0
     length 6
     foobar[space][space]
     T
     length 3
     baz[space][space][space][space][space]
     T
     ...

    So a doc's value can be retrieved by seeking to startOffset + (9+pattern.length+maxlength+2)*doc the extra 9 is 2 newlines, plus "length " itself. The extra 2 is another newline and 'T' or 'F': true if the value is real, false if missing.

    For sorted bytes this is a fixed-width file, for example:

     field myField
       type SORTED
       numvalues 10
       maxLength 8
       pattern 0
       ordpattern 00
     length 6
     foobar[space][space]
     length 3
     baz[space][space][space][space][space]
     ...
     03
     06
     01
     10
     ...

    So the "ord section" begins at startOffset + (9+pattern.length+maxlength)numValues. A document's ord can be retrieved by seeking to "ord section" + (1+ordpattern.length())docid an ord's value can be retrieved by seeking to startOffset + (9+pattern.length+maxlength)*ord

    For sorted set this is a fixed-width file very similar to the SORTED case, for example:

     field myField
       type SORTED_SET
       numvalues 10
       maxLength 8
       pattern 0
       ordpattern XXXXX
     length 6
     foobar[space][space]
     length 3
     baz[space][space][space][space][space]
     ...
     0,3,5   
     1,2
    
     10
     ...

    So the "ord section" begins at startOffset + (9+pattern.length+maxlength)numValues. A document's ord list can be retrieved by seeking to "ord section" + (1+ordpattern.length())docid this is a comma-separated list, and its padded with spaces to be fixed width. so trim() and split() it. and beware the empty string! An ord's value can be retrieved by seeking to startOffset + (9+pattern.length+maxlength)*ord

    The reader can just scan this file when it opens, skipping over the data blocks and saving the offset/etc for each field.

    @lucene.experimental

    Inheritance
    System.Object
    DocValuesFormat
    SimpleTextDocValuesFormat
    Inherited Members
    DocValuesFormat.SetDocValuesFormatFactory(IDocValuesFormatFactory)
    DocValuesFormat.GetDocValuesFormatFactory()
    DocValuesFormat.Name
    DocValuesFormat.ToString()
    DocValuesFormat.ForName(String)
    DocValuesFormat.AvailableDocValuesFormats
    Namespace: Lucene.Net.Codecs.SimpleText
    Assembly: Lucene.Net.Codecs.dll
    Syntax
    public class SimpleTextDocValuesFormat : DocValuesFormat

    Constructors

    | Improve this Doc View Source

    SimpleTextDocValuesFormat()

    Declaration
    public SimpleTextDocValuesFormat()

    Methods

    | Improve this Doc View Source

    FieldsConsumer(SegmentWriteState)

    Declaration
    public override DocValuesConsumer FieldsConsumer(SegmentWriteState state)
    Parameters
    Type Name Description
    SegmentWriteState state
    Returns
    Type Description
    DocValuesConsumer
    Overrides
    DocValuesFormat.FieldsConsumer(SegmentWriteState)
    | Improve this Doc View Source

    FieldsProducer(SegmentReadState)

    Declaration
    public override DocValuesProducer FieldsProducer(SegmentReadState state)
    Parameters
    Type Name Description
    SegmentReadState state
    Returns
    Type Description
    DocValuesProducer
    Overrides
    DocValuesFormat.FieldsProducer(SegmentReadState)
    • Improve this Doc
    • View Source
    Back to top Copyright © 2020 Licensed to the Apache Software Foundation (ASF)