• API

    Show / Hide Table of Contents

    Class SimpleTextDocValuesFormat

    Plain text doc values format.

    FOR RECREATIONAL USE ONLY

    The .dat file contains the data. For numbers this is a "fixed-width" file, for example a single byte range:

     field myField
       type NUMERIC
       minvalue 0
       pattern 000
     005
     T
     234
     T
     123
     T
     ...

    So a document's value (delta encoded from minvalue) can be retrieved by seeking to startOffset + (1+pattern.length()+2)*docid. The extra 1 is the newline. The extra 2 is another newline and 'T' or 'F': true if the value is real, false if missing.

    for bytes this is also a "fixed-width" file, for example:

     field myField
       type BINARY
       maxlength 6
       pattern 0
     length 6
     foobar[space][space]
     T
     length 3
     baz[space][space][space][space][space]
     T
     ...

    So a doc's value can be retrieved by seeking to startOffset + (9+pattern.length+maxlength+2)*doc the extra 9 is 2 newlines, plus "length " itself. The extra 2 is another newline and 'T' or 'F': true if the value is real, false if missing.

    For sorted bytes this is a fixed-width file, for example:

     field myField
       type SORTED
       numvalues 10
       maxLength 8
       pattern 0
       ordpattern 00
     length 6
     foobar[space][space]
     length 3
     baz[space][space][space][space][space]
     ...
     03
     06
     01
     10
     ...

    So the "ord section" begins at startOffset + (9+pattern.length+maxlength)numValues. A document's ord can be retrieved by seeking to "ord section" + (1+ordpattern.length())docid an ord's value can be retrieved by seeking to startOffset + (9+pattern.length+maxlength)*ord

    For sorted set this is a fixed-width file very similar to the SORTED case, for example:

     field myField
       type SORTED_SET
       numvalues 10
       maxLength 8
       pattern 0
       ordpattern XXXXX
     length 6
     foobar[space][space]
     length 3
     baz[space][space][space][space][space]
     ...
     0,3,5   
     1,2
    
     10
     ...

    So the "ord section" begins at startOffset + (9+pattern.length+maxlength)numValues. A document's ord list can be retrieved by seeking to "ord section" + (1+ordpattern.length())docid this is a comma-separated list, and its padded with spaces to be fixed width. so trim() and split() it. and beware the empty string! An ord's value can be retrieved by seeking to startOffset + (9+pattern.length+maxlength)*ord

    The reader can just scan this file when it opens, skipping over the data blocks and saving the offset/etc for each field.

    This is a Lucene.NET EXPERIMENTAL API, use at your own risk

    Inheritance
    System.Object
    Lucene.Net.Codecs.DocValuesFormat
    SimpleTextDocValuesFormat
    Inherited Members
    Lucene.Net.Codecs.DocValuesFormat.SetDocValuesFormatFactory(Lucene.Net.Codecs.IDocValuesFormatFactory)
    Lucene.Net.Codecs.DocValuesFormat.GetDocValuesFormatFactory()
    Lucene.Net.Codecs.DocValuesFormat.Name
    Lucene.Net.Codecs.DocValuesFormat.ToString()
    Lucene.Net.Codecs.DocValuesFormat.ForName(System.String)
    Lucene.Net.Codecs.DocValuesFormat.AvailableDocValuesFormats
    System.Object.Equals(System.Object)
    System.Object.Equals(System.Object, System.Object)
    System.Object.GetHashCode()
    System.Object.GetType()
    System.Object.MemberwiseClone()
    System.Object.ReferenceEquals(System.Object, System.Object)
    Namespace: Lucene.Net.Codecs.SimpleText
    Assembly: Lucene.Net.Codecs.dll
    Syntax
    [DocValuesFormatName("SimpleText")]
    public class SimpleTextDocValuesFormat : DocValuesFormat

    Constructors

    | Improve this Doc View Source

    SimpleTextDocValuesFormat()

    Declaration
    public SimpleTextDocValuesFormat()

    Methods

    | Improve this Doc View Source

    FieldsConsumer(SegmentWriteState)

    Declaration
    public override DocValuesConsumer FieldsConsumer(SegmentWriteState state)
    Parameters
    Type Name Description
    Lucene.Net.Index.SegmentWriteState state
    Returns
    Type Description
    Lucene.Net.Codecs.DocValuesConsumer
    Overrides
    Lucene.Net.Codecs.DocValuesFormat.FieldsConsumer(Lucene.Net.Index.SegmentWriteState)
    | Improve this Doc View Source

    FieldsProducer(SegmentReadState)

    Declaration
    public override DocValuesProducer FieldsProducer(SegmentReadState state)
    Parameters
    Type Name Description
    Lucene.Net.Index.SegmentReadState state
    Returns
    Type Description
    Lucene.Net.Codecs.DocValuesProducer
    Overrides
    Lucene.Net.Codecs.DocValuesFormat.FieldsProducer(Lucene.Net.Index.SegmentReadState)
    • Improve this Doc
    • View Source
    Back to top Copyright © 2020 Licensed to the Apache Software Foundation (ASF)