Show / Hide Table of Contents

    Namespace Lucene.Net.Util.Packed

    Packed integer arrays and streams.

    The packed package provides * sequential and random access capable arrays of positive longs, * routines for efficient serialization and deserialization of streams of packed integers. The implementations provide different trade-offs between memory usage and access speed. The standard usage scenario is replacing large int or long arrays in order to reduce the memory footprint.

    The main access point is the <xref:Lucene.Net.Util.Packed.PackedInts> factory.

    In-memory structures

    • <xref:Lucene.Net.Util.Packed.PackedInts.Mutable>

      • Only supports positive longs.

      • Requires the number of bits per value to be known in advance.

      • Random-access for both writing and reading.

    • GrowableWriter

      • Same as PackedInts.Mutable but grows the number of bits per values when needed.

      • Useful to build a PackedInts.Mutable from a read-once stream of longs.

    • PagedGrowableWriter

      • Slices data into fixed-size blocks stored in GrowableWriters.

      • Supports more than 2B values.

      • You should use Appending(Delta)PackedLongBuffer instead if you don't need random write access.

    • <xref:Lucene.Net.Util.Packed.AppendingDeltaPackedLongBuffer>

      • Can store any sequence of longs.

      • Compression is good when values are close to each other.

      • Supports random reads, but only sequential writes.

      • Can address up to 2^42 values.

    • <xref:Lucene.Net.Util.Packed.AppendingPackedLongBuffer>

      • Same as AppendingDeltaPackedLongBuffer but assumes values are 0-based.
    • <xref:Lucene.Net.Util.Packed.MonotonicAppendingLongBuffer>

      • Same as AppendingDeltaPackedLongBuffer except that compression is good when the stream is a succession of affine functions.

    Disk-based structures

    • <xref:Lucene.Net.Util.Packed.PackedInts.Writer>, <xref:Lucene.Net.Util.Packed.PackedInts.Reader>, <xref:Lucene.Net.Util.Packed.PackedInts.ReaderIterator>

      • Only supports positive longs.

      • Requires the number of bits per value to be known in advance.

      • Supports both fast sequential access with low memory footprint with ReaderIterator and random-access by either loading values in memory or leaving them on disk with Reader.

    • BlockPackedWriter, BlockPackedReader, BlockPackedReaderIterator

      • Splits the stream into fixed-size blocks.

      • Compression is good when values are close to each other.

      • Can address up to 2B * blockSize values.

    • MonotonicBlockPackedWriter, MonotonicBlockPackedReader

      • Same as the non-monotonic variants except that compression is good when the stream is a succession of affine functions.

      • The reason why there is no sequential access is that if you need sequential access, you should rather delta-encode and use BlockPackedWriter.

    • PackedDataOutput, PackedDataInput

      • Writes sequences of longs where each long can use any number of bits.

    Classes

    AbstractAppendingInt64Buffer

    Common functionality shared by AppendingDeltaPackedInt64Buffer and MonotonicAppendingInt64Buffer.

    NOTE: This was AbstractAppendingLongBuffer in Lucene

    AbstractAppendingInt64Buffer.Iterator

    AbstractBlockPackedWriter

    AbstractPagedMutable<T>

    Base implementation for PagedMutable and PagedGrowableWriter.

    This is a Lucene.NET INTERNAL API, use at your own risk

    AppendingDeltaPackedInt64Buffer

    Utility class to buffer a list of signed longs in memory. This class only supports appending and is optimized for the case where values are close to each other.

    NOTE: This was AppendingDeltaPackedLongBuffer in Lucene

    This is a Lucene.NET INTERNAL API, use at your own risk

    AppendingPackedInt64Buffer

    Utility class to buffer a list of signed longs in memory. This class only supports appending and is optimized for non-negative numbers with a uniform distribution over a fixed (limited) range.

    NOTE: This was AppendingPackedLongBuffer in Lucene

    This is a Lucene.NET INTERNAL API, use at your own risk

    BlockPackedReader

    Provides random access to a stream written with BlockPackedWriter.

    This is a Lucene.NET INTERNAL API, use at your own risk

    BlockPackedReaderIterator

    Reader for sequences of s written with BlockPackedWriter.

    This is a Lucene.NET INTERNAL API, use at your own risk

    BlockPackedWriter

    A writer for large sequences of longs.

    The sequence is divided into fixed-size blocks and for each block, the difference between each value and the minimum value of the block is encoded using as few bits as possible. Memory usage of this class is proportional to the block size. Each block has an overhead between 1 and 10 bytes to store the minimum value and the number of bits per value of the block.

    Format:

    • <BLock>BlockCount
    • BlockCount: ⌈ ValueCount / BlockSize ⌉
    • Block: <Header, (Ints)>
    • Header: <Token, (MinValue)>
    • Token: a byte (WriteByte(Byte)), first 7 bits are the number of bits per value (bitsPerValue). If the 8th bit is 1, then MinValue (see next) is 0, otherwise MinValue and needs to be decoded
    • MinValue: a zigzag-encoded variable-length (WriteVInt64(Int64)) whose value should be added to every int from the block to restore the original values
    • Ints: If the number of bits per value is 0, then there is nothing to decode and all ints are equal to MinValue. Otherwise: BlockSize packed ints (PackedInt32s) encoded on exactly bitsPerValue bits per value. They are the subtraction of the original values and MinValue

    This is a Lucene.NET INTERNAL API, use at your own risk

    EliasFanoDecoder

    A decoder for an EliasFanoEncoder.

    This is a Lucene.NET INTERNAL API, use at your own risk

    EliasFanoDocIdSet

    A DocIdSet in Elias-Fano encoding.

    This is a Lucene.NET INTERNAL API, use at your own risk

    EliasFanoEncoder

    Encode a non decreasing sequence of non negative whole numbers in the Elias-Fano encoding that was introduced in the 1970's by Peter Elias and Robert Fano.

    The Elias-Fano encoding is a high bits / low bits representation of a monotonically increasing sequence of numValues > 0 natural numbers x[i]

    0 <= x[0] <= x[1] <= ... <= x[numValues-2] <= x[numValues-1] <= upperBound

    where upperBound > 0 is an upper bound on the last value.

    The Elias-Fano encoding uses less than half a bit per encoded number more than the smallest representation that can encode any monotone sequence with the same bounds.

    The lower L bits of each x[i] are stored explicitly and contiguously in the lower-bits array, with L chosen as (Log() base 2):

    L = max(0, floor(log(upperBound/numValues)))

    The upper bits are stored in the upper-bits array as a sequence of unary-coded gaps (x[-1] = 0):

    (x[i]/2L) - (x[i-1]/2L)

    The unary code encodes a natural number n by n 0 bits followed by a 1 bit: 0...01.

    In the upper bits the total the number of 1 bits is numValues and the total number of 0 bits is:

    floor(x[numValues-1]/2L) <= upperBound/(2max(0, floor(log(upperBound/numValues)))) <= 2numValues

    The Elias-Fano encoding uses at most

    2 + Ceil(Log(upperBound/numValues))

    bits per encoded number. With upperBound in these bounds (p is an integer):

    2p < x[numValues-1] <= upperBound <= 2*(p+1)

    the number of bits per encoded number is minimized.

    In this implementation the values in the sequence can be given as long, numValues = 0 and upperBound = 0 are allowed, and each of the upper and lower bit arrays should fit in a long[].

    An index of positions of zero's in the upper bits is also built.

    this implementation is based on this article:

    Sebastiano Vigna, "Quasi Succinct Indices", June 19, 2012, sections 3, 4 and 9. Retrieved from http://arxiv.org/pdf/1206.4300 .

    The articles originally describing the Elias-Fano representation are:

    Peter Elias, "Efficient storage and retrieval by content and address of static files", J. Assoc. Comput. Mach., 21(2):246â€"260, 1974.

    Robert M. Fano, "On the number of bits required to implement an associative memory", Memorandum 61, Computer Structures Group, Project MAC, MIT, Cambridge, Mass., 1971.

    This is a Lucene.NET INTERNAL API, use at your own risk

    GrowableWriter

    Implements PackedInt32s.Mutable, but grows the bit count of the underlying packed ints on-demand.

    Beware that this class will accept to set negative values but in order to do this, it will grow the number of bits per value to 64.

    This is a Lucene.NET INTERNAL API, use at your own risk

    MonotonicAppendingInt64Buffer

    Utility class to buffer signed longs in memory, which is optimized for the case where the sequence is monotonic, although it can encode any sequence of arbitrary longs. It only supports appending.

    NOTE: This was MonotonicAppendingLongBuffer in Lucene.

    This is a Lucene.NET INTERNAL API, use at your own risk

    MonotonicBlockPackedReader

    Provides random access to a stream written with MonotonicBlockPackedWriter.

    This is a Lucene.NET INTERNAL API, use at your own risk

    MonotonicBlockPackedWriter

    A writer for large monotonically increasing sequences of positive s.

    The sequence is divided into fixed-size blocks and for each block, values are modeled after a linear function f: x → A × x + B. The block encodes deltas from the expected values computed from this function using as few bits as possible. Each block has an overhead between 6 and 14 bytes.

    Format:

    • <BLock>BlockCount
    • BlockCount: ⌈ ValueCount / BlockSize ⌉
    • Block: <Header, (Ints)>
    • Header: <B, A, BitsPerValue>
    • B: the B from f: x → A × x + B using a variable-length (WriteVInt64(Int64))
    • A: the A from f: x → A × x + B encoded using on 4 bytes (WriteVInt32(Int32))
    • BitsPerValue: a variable-length (WriteVInt32(Int32))
    • Ints: if BitsPerValue is 0, then there is nothing to read and all values perfectly match the result of the function. Otherwise, these are the zigzag-encoded packed (PackedInt32s) deltas from the expected value (computed from the function) using exaclty BitsPerValue bits per value

    This is a Lucene.NET INTERNAL API, use at your own risk

    Packed64

    Space optimized random access capable array of values with a fixed number of bits/value. Values are packed contiguously.

    The implementation strives to perform af fast as possible under the constraint of contiguous bits, by avoiding expensive operations. This comes at the cost of code clarity.

    Technical details: this implementation is a refinement of a non-branching version. The non-branching get and set methods meant that 2 or 4 atomics in the underlying array were always accessed, even for the cases where only 1 or 2 were needed. Even with caching, this had a detrimental effect on performance. Related to this issue, the old implementation used lookup tables for shifts and masks, which also proved to be a bit slower than calculating the shifts and masks on the fly. See https://issues.apache.org/jira/browse/LUCENE-4062 for details.

    PackedDataInput

    A DataInput wrapper to read unaligned, variable-length packed integers. This API is much slower than the PackedInt32s fixed-length API but can be convenient to save space.

    This is a Lucene.NET INTERNAL API, use at your own risk

    PackedDataOutput

    A DataOutput wrapper to write unaligned, variable-length packed integers.

    This is a Lucene.NET INTERNAL API, use at your own risk

    PackedInt32s

    Simplistic compression for array of unsigned long values. Each value is >= 0 and <= a specified maximum value. The values are stored as packed ints, with each value consuming a fixed number of bits.

    NOTE: This was PackedInts in Lucene.

    This is a Lucene.NET INTERNAL API, use at your own risk

    PackedInt32s.Format

    A format to write packed s.

    This is a Lucene.NET INTERNAL API, use at your own risk

    PackedInt32s.FormatAndBits

    Simple class that holds a format and a number of bits per value.

    PackedInt32s.Header

    Header identifying the structure of a packed integer array.

    PackedInt32s.Mutable

    A packed integer array that can be modified.

    This is a Lucene.NET INTERNAL API, use at your own risk

    PackedInt32s.MutableImpl

    PackedInt32s.NullReader

    A PackedInt32s.Reader which has all its values equal to 0 (bitsPerValue = 0).

    PackedInt32s.Reader

    A read-only random access array of positive integers.

    This is a Lucene.NET INTERNAL API, use at your own risk

    PackedInt32s.Writer

    A write-once Writer.

    This is a Lucene.NET INTERNAL API, use at your own risk

    PagedGrowableWriter

    A PagedGrowableWriter. This class slices data into fixed-size blocks which have independent numbers of bits per value and grow on-demand.

    You should use this class instead of the AbstractAppendingInt64Buffer related ones only when you need random write-access. Otherwise this class will likely be slower and less memory-efficient.

    This is a Lucene.NET INTERNAL API, use at your own risk

    PagedMutable

    A PagedMutable. This class slices data into fixed-size blocks which have the same number of bits per value. It can be a useful replacement for PackedInt32s.Mutable to store more than 2B values.

    This is a Lucene.NET INTERNAL API, use at your own risk

    Interfaces

    PackedInt32s.IDecoder

    A decoder for packed integers.

    PackedInt32s.IEncoder

    An encoder for packed integers.

    PackedInt32s.IReaderIterator

    Run-once iterator interface, to decode previously saved PackedInt32s.

    • Improve this Doc
    Back to top Copyright © 2020 Licensed to the Apache Software Foundation (ASF)