Fork me on GitHub
  • API

    Show / Hide Table of Contents

    Class SimpleTextDocValuesFormat

    Plain text doc values format.

    FOR RECREATIONAL USE ONLY

    The .dat file contains the data. For numbers this is a "fixed-width" file, for example a single byte range:

    field myField
      type NUMERIC
      minvalue 0
      pattern 000
    005
    T
    234
    T
    123
    T
    ...
    So a document's value (delta encoded from minvalue) can be retrieved by seeking to startOffset + (1+pattern.length()+2)*docid. The extra 1 is the newline. The extra 2 is another newline and 'T' or 'F': true if the value is real, false if missing.

    for bytes this is also a "fixed-width" file, for example:

    field myField
      type BINARY
      maxlength 6
      pattern 0
    length 6
    foobar[space][space]
    T
    length 3
    baz[space][space][space][space][space]
    T
    ...

    So a doc's value can be retrieved by seeking to startOffset + (9+pattern.length+maxlength+2)*doc the extra 9 is 2 newlines, plus "length " itself. The extra 2 is another newline and 'T' or 'F': true if the value is real, false if missing.

    For sorted bytes this is a fixed-width file, for example:

    field myField
      type SORTED
      numvalues 10
      maxLength 8
      pattern 0
      ordpattern 00
    length 6
    foobar[space][space]
    length 3
    baz[space][space][space][space][space]
    ...
    03
    06
    01
    10
    ...

    So the "ord section" begins at startOffset + (9+pattern.length+maxlength)*numValues. A document's ord can be retrieved by seeking to "ord section" + (1+ordpattern.length())*docid an ord's value can be retrieved by seeking to startOffset + (9+pattern.length+maxlength)*ord

    For sorted set this is a fixed-width file very similar to the SORTED case, for example:

    field myField
      type SORTED_SET
      numvalues 10
      maxLength 8
      pattern 0
      ordpattern XXXXX
    length 6
    foobar[space][space]
    length 3
    baz[space][space][space][space][space]
    ...
    0,3,5   
    1,2
    
    10
    ...

    So the "ord section" begins at startOffset + (9+pattern.length+maxlength)*numValues. A document's ord list can be retrieved by seeking to "ord section" + (1+ordpattern.length())*docid this is a comma-separated list, and its padded with spaces to be fixed width. so trim() and split() it. and beware the empty string! An ord's value can be retrieved by seeking to startOffset + (9+pattern.length+maxlength)*ord

    The reader can just scan this file when it opens, skipping over the data blocks and saving the offset/etc for each field.

    Note

    This API is experimental and might change in incompatible ways in the next release.

    Inheritance
    object
    DocValuesFormat
    SimpleTextDocValuesFormat
    Inherited Members
    DocValuesFormat.SetDocValuesFormatFactory(IDocValuesFormatFactory)
    DocValuesFormat.GetDocValuesFormatFactory()
    DocValuesFormat.Name
    DocValuesFormat.ToString()
    DocValuesFormat.ForName(string)
    DocValuesFormat.AvailableDocValuesFormats
    object.Equals(object)
    object.Equals(object, object)
    object.GetHashCode()
    object.GetType()
    object.MemberwiseClone()
    object.ReferenceEquals(object, object)
    Namespace: Lucene.Net.Codecs.SimpleText
    Assembly: Lucene.Net.Codecs.dll
    Syntax
    [DocValuesFormatName("SimpleText")]
    public class SimpleTextDocValuesFormat : DocValuesFormat

    Constructors

    SimpleTextDocValuesFormat()

    Plain text doc values format.

    FOR RECREATIONAL USE ONLY

    The .dat file contains the data. For numbers this is a "fixed-width" file, for example a single byte range:

    field myField
      type NUMERIC
      minvalue 0
      pattern 000
    005
    T
    234
    T
    123
    T
    ...
    So a document's value (delta encoded from minvalue) can be retrieved by seeking to startOffset + (1+pattern.length()+2)*docid. The extra 1 is the newline. The extra 2 is another newline and 'T' or 'F': true if the value is real, false if missing.

    for bytes this is also a "fixed-width" file, for example:

    field myField
      type BINARY
      maxlength 6
      pattern 0
    length 6
    foobar[space][space]
    T
    length 3
    baz[space][space][space][space][space]
    T
    ...

    So a doc's value can be retrieved by seeking to startOffset + (9+pattern.length+maxlength+2)*doc the extra 9 is 2 newlines, plus "length " itself. The extra 2 is another newline and 'T' or 'F': true if the value is real, false if missing.

    For sorted bytes this is a fixed-width file, for example:

    field myField
      type SORTED
      numvalues 10
      maxLength 8
      pattern 0
      ordpattern 00
    length 6
    foobar[space][space]
    length 3
    baz[space][space][space][space][space]
    ...
    03
    06
    01
    10
    ...

    So the "ord section" begins at startOffset + (9+pattern.length+maxlength)*numValues. A document's ord can be retrieved by seeking to "ord section" + (1+ordpattern.length())*docid an ord's value can be retrieved by seeking to startOffset + (9+pattern.length+maxlength)*ord

    For sorted set this is a fixed-width file very similar to the SORTED case, for example:

    field myField
      type SORTED_SET
      numvalues 10
      maxLength 8
      pattern 0
      ordpattern XXXXX
    length 6
    foobar[space][space]
    length 3
    baz[space][space][space][space][space]
    ...
    0,3,5   
    1,2
    
    10
    ...

    So the "ord section" begins at startOffset + (9+pattern.length+maxlength)*numValues. A document's ord list can be retrieved by seeking to "ord section" + (1+ordpattern.length())*docid this is a comma-separated list, and its padded with spaces to be fixed width. so trim() and split() it. and beware the empty string! An ord's value can be retrieved by seeking to startOffset + (9+pattern.length+maxlength)*ord

    The reader can just scan this file when it opens, skipping over the data blocks and saving the offset/etc for each field.

    Note

    This API is experimental and might change in incompatible ways in the next release.

    Declaration
    public SimpleTextDocValuesFormat()

    Methods

    FieldsConsumer(SegmentWriteState)

    Returns a Lucene.Net.Codecs.DocValuesConsumer to write docvalues to the index.

    Declaration
    public override DocValuesConsumer FieldsConsumer(SegmentWriteState state)
    Parameters
    Type Name Description
    SegmentWriteState state
    Returns
    Type Description
    DocValuesConsumer
    Overrides
    Lucene.Net.Codecs.DocValuesFormat.FieldsConsumer(Lucene.Net.Index.SegmentWriteState)

    FieldsProducer(SegmentReadState)

    Returns a Lucene.Net.Codecs.DocValuesProducer to read docvalues from the index.

    NOTE: by the time this call returns, it must hold open any files it will need to use; else, those files may be deleted. Additionally, required files may be deleted during the execution of this call before there is a chance to open them. Under these circumstances an IOException should be thrown by the implementation. IOExceptions are expected and will automatically cause a retry of the segment opening logic with the newly revised segments.
    Declaration
    public override DocValuesProducer FieldsProducer(SegmentReadState state)
    Parameters
    Type Name Description
    SegmentReadState state
    Returns
    Type Description
    DocValuesProducer
    Overrides
    Lucene.Net.Codecs.DocValuesFormat.FieldsProducer(Lucene.Net.Index.SegmentReadState)
    Back to top Copyright © 2024 The Apache Software Foundation, Licensed under the Apache License, Version 2.0
    Apache Lucene.Net, Lucene.Net, Apache, the Apache feather logo, and the Apache Lucene.Net project logo are trademarks of The Apache Software Foundation.
    All other marks mentioned may be trademarks or registered trademarks of their respective owners.