Namespace Lucene.Net.Util.Packed

Packed integer arrays and streams.

The packed package provides * sequential and random access capable arrays of positive longs, * routines for efficient serialization and deserialization of streams of packed integers. The implementations provide different trade-offs between memory usage and access speed. The standard usage scenario is replacing large int or long arrays in order to reduce the memory footprint.

The main access point is the <xref:Lucene.Net.Util.Packed.PackedInts> factory.

In-memory structures

<xref:Lucene.Net.Util.Packed.PackedInts.Mutable>
- Only supports positive longs.
- Requires the number of bits per value to be known in advance.
- Random-access for both writing and reading.
GrowableWriter
- Same as PackedInts.Mutable but grows the number of bits per values when needed.
- Useful to build a PackedInts.Mutable from a read-once stream of longs.
PagedGrowableWriter
- Slices data into fixed-size blocks stored in GrowableWriters.
- Supports more than 2B values.
- You should use Appending(Delta)PackedLongBuffer instead if you don't need random write access.
<xref:Lucene.Net.Util.Packed.AppendingDeltaPackedLongBuffer>
- Can store any sequence of longs.
- Compression is good when values are close to each other.
- Supports random reads, but only sequential writes.
- Can address up to 2^42 values.
<xref:Lucene.Net.Util.Packed.AppendingPackedLongBuffer>
- Same as AppendingDeltaPackedLongBuffer but assumes values are 0-based.
<xref:Lucene.Net.Util.Packed.MonotonicAppendingLongBuffer>
- Same as AppendingDeltaPackedLongBuffer except that compression is good when the stream is a succession of affine functions.

Disk-based structures

<xref:Lucene.Net.Util.Packed.PackedInts.Writer>, <xref:Lucene.Net.Util.Packed.PackedInts.Reader>, <xref:Lucene.Net.Util.Packed.PackedInts.ReaderIterator>
- Only supports positive longs.
- Requires the number of bits per value to be known in advance.
- Supports both fast sequential access with low memory footprint with ReaderIterator and random-access by either loading values in memory or leaving them on disk with Reader.
BlockPackedWriter, BlockPackedReader, BlockPackedReaderIterator
- Splits the stream into fixed-size blocks.
- Compression is good when values are close to each other.
- Can address up to 2B * blockSize values.
MonotonicBlockPackedWriter, MonotonicBlockPackedReader
- Same as the non-monotonic variants except that compression is good when the stream is a succession of affine functions.
- The reason why there is no sequential access is that if you need sequential access, you should rather delta-encode and use BlockPackedWriter.
PackedDataOutput, PackedDataInput
- Writes sequences of longs where each long can use any number of bits.

Classes

AbstractAppendingInt64Buffer

Common functionality shared by AppendingDeltaPackedInt64Buffer and MonotonicAppendingInt64Buffer.

NOTE: This was AbstractAppendingLongBuffer in Lucene

AbstractAppendingInt64Buffer.Iterator

AbstractBlockPackedWriter

AbstractPagedMutable<T>

Base implementation for PagedMutable and PagedGrowableWriter.

This is a Lucene.NET INTERNAL API, use at your own risk

AppendingDeltaPackedInt64Buffer

Utility class to buffer a list of signed longs in memory. This class only supports appending and is optimized for the case where values are close to each other.

NOTE: This was AppendingDeltaPackedLongBuffer in Lucene

This is a Lucene.NET INTERNAL API, use at your own risk

AppendingPackedInt64Buffer

Utility class to buffer a list of signed longs in memory. This class only supports appending and is optimized for non-negative numbers with a uniform distribution over a fixed (limited) range.

NOTE: This was AppendingPackedLongBuffer in Lucene

This is a Lucene.NET INTERNAL API, use at your own risk

BlockPackedReader

Provides random access to a stream written with BlockPackedWriter.

This is a Lucene.NET INTERNAL API, use at your own risk

BlockPackedReaderIterator

Reader for sequences of s written with BlockPackedWriter.

This is a Lucene.NET INTERNAL API, use at your own risk

BlockPackedWriter

A writer for large sequences of longs.

The sequence is divided into fixed-size blocks and for each block, the difference between each value and the minimum value of the block is encoded using as few bits as possible. Memory usage of this class is proportional to the block size. Each block has an overhead between 1 and 10 bytes to store the minimum value and the number of bits per value of the block.

Format:

<BLock>^BlockCount
BlockCount: ⌈ ValueCount / BlockSize ⌉
Block: <Header, (Ints)>
Header: <Token, (MinValue)>
Token: a byte (WriteByte(Byte)), first 7 bits are the number of bits per value (bitsPerValue). If the 8th bit is 1, then MinValue (see next) is 0, otherwise MinValue and needs to be decoded
MinValue: a zigzag-encoded variable-length (WriteVInt64(Int64)) whose value should be added to every int from the block to restore the original values
Ints: If the number of bits per value is 0, then there is nothing to decode and all ints are equal to MinValue. Otherwise: BlockSize packed ints (PackedInt32s) encoded on exactly bitsPerValue bits per value. They are the subtraction of the original values and MinValue

This is a Lucene.NET INTERNAL API, use at your own risk

EliasFanoDecoder

A decoder for an EliasFanoEncoder.

This is a Lucene.NET INTERNAL API, use at your own risk

EliasFanoDocIdSet

A DocIdSet in Elias-Fano encoding.

This is a Lucene.NET INTERNAL API, use at your own risk

EliasFanoEncoder

Encode a non decreasing sequence of non negative whole numbers in the Elias-Fano encoding that was introduced in the 1970's by Peter Elias and Robert Fano.

The Elias-Fano encoding is a high bits / low bits representation of a monotonically increasing sequence of numValues > 0 natural numbers x[i]

0 <= x[0] <= x[1] <= ... <= x[numValues-2] <= x[numValues-1] <= upperBound

where upperBound > 0 is an upper bound on the last value.

The Elias-Fano encoding uses less than half a bit per encoded number more than the smallest representation that can encode any monotone sequence with the same bounds.

The lower L bits of each x[i] are stored explicitly and contiguously in the lower-bits array, with L chosen as (Log() base 2):

L = max(0, floor(log(upperBound/numValues)))

The upper bits are stored in the upper-bits array as a sequence of unary-coded gaps (x[-1] = 0):

(x[i]/2L) - (x[i-1]/2L)

The unary code encodes a natural number n by n 0 bits followed by a 1 bit: 0...01.

In the upper bits the total the number of 1 bits is numValues and the total number of 0 bits is:

floor(x[numValues-1]/2L) <= upperBound/(2max(0, floor(log(upperBound/numValues)))) <= 2numValues

The Elias-Fano encoding uses at most

2 + Ceil(Log(upperBound/numValues))

bits per encoded number. With upperBound in these bounds (p is an integer):

2p < x[numValues-1] <= upperBound <= 2*(p+1)

the number of bits per encoded number is minimized.

In this implementation the values in the sequence can be given as long, numValues = 0 and upperBound = 0 are allowed, and each of the upper and lower bit arrays should fit in a long[].

An index of positions of zero's in the upper bits is also built.

this implementation is based on this article:

Sebastiano Vigna, "Quasi Succinct Indices", June 19, 2012, sections 3, 4 and 9. Retrieved from http://arxiv.org/pdf/1206.4300 .

The articles originally describing the Elias-Fano representation are:

Peter Elias, "Efficient storage and retrieval by content and address of static files", J. Assoc. Comput. Mach., 21(2):246â€"260, 1974.

Robert M. Fano, "On the number of bits required to implement an associative memory", Memorandum 61, Computer Structures Group, Project MAC, MIT, Cambridge, Mass., 1971.

This is a Lucene.NET INTERNAL API, use at your own risk

GrowableWriter

Implements PackedInt32s.Mutable, but grows the bit count of the underlying packed ints on-demand.

Beware that this class will accept to set negative values but in order to do this, it will grow the number of bits per value to 64.

This is a Lucene.NET INTERNAL API, use at your own risk

MonotonicAppendingInt64Buffer

Utility class to buffer signed longs in memory, which is optimized for the case where the sequence is monotonic, although it can encode any sequence of arbitrary longs. It only supports appending.

NOTE: This was MonotonicAppendingLongBuffer in Lucene.

This is a Lucene.NET INTERNAL API, use at your own risk

MonotonicBlockPackedReader

Provides random access to a stream written with MonotonicBlockPackedWriter.

This is a Lucene.NET INTERNAL API, use at your own risk

MonotonicBlockPackedWriter

A writer for large monotonically increasing sequences of positive s.

The sequence is divided into fixed-size blocks and for each block, values are modeled after a linear function f: x → A × x + B. The block encodes deltas from the expected values computed from this function using as few bits as possible. Each block has an overhead between 6 and 14 bytes.

Format:

<BLock>^BlockCount
BlockCount: ⌈ ValueCount / BlockSize ⌉
Block: <Header, (Ints)>
Header: <B, A, BitsPerValue>
B: the B from f: x → A × x + B using a variable-length (WriteVInt64(Int64))
A: the A from f: x → A × x + B encoded using on 4 bytes (WriteVInt32(Int32))
BitsPerValue: a variable-length (WriteVInt32(Int32))
Ints: if BitsPerValue is 0, then there is nothing to read and all values perfectly match the result of the function. Otherwise, these are the zigzag-encoded packed (PackedInt32s) deltas from the expected value (computed from the function) using exaclty BitsPerValue bits per value

This is a Lucene.NET INTERNAL API, use at your own risk

Packed64

Space optimized random access capable array of values with a fixed number of bits/value. Values are packed contiguously.

The implementation strives to perform af fast as possible under the constraint of contiguous bits, by avoiding expensive operations. This comes at the cost of code clarity.

Technical details: this implementation is a refinement of a non-branching version. The non-branching get and set methods meant that 2 or 4 atomics in the underlying array were always accessed, even for the cases where only 1 or 2 were needed. Even with caching, this had a detrimental effect on performance. Related to this issue, the old implementation used lookup tables for shifts and masks, which also proved to be a bit slower than calculating the shifts and masks on the fly. See https://issues.apache.org/jira/browse/LUCENE-4062 for details.

PackedDataInput

A DataInput wrapper to read unaligned, variable-length packed integers. This API is much slower than the PackedInt32s fixed-length API but can be convenient to save space.

This is a Lucene.NET INTERNAL API, use at your own risk

PackedDataOutput

A DataOutput wrapper to write unaligned, variable-length packed integers.

This is a Lucene.NET INTERNAL API, use at your own risk

PackedInt32s

Simplistic compression for array of unsigned long values. Each value is >= 0 and <= a specified maximum value. The values are stored as packed ints, with each value consuming a fixed number of bits.

NOTE: This was PackedInts in Lucene.

This is a Lucene.NET INTERNAL API, use at your own risk

PagedMutable

A PagedMutable. This class slices data into fixed-size blocks which have the same number of bits per value. It can be a useful replacement for PackedInt32s.Mutable to store more than 2B values.

This is a Lucene.NET INTERNAL API, use at your own risk

Namespace Lucene.Net.Util.Packed

Packed integer arrays and streams.

In-memory structures

Disk-based structures

Classes

AbstractAppendingInt64Buffer

AbstractAppendingInt64Buffer.Iterator

AbstractBlockPackedWriter

AbstractPagedMutable<T>

AppendingDeltaPackedInt64Buffer

AppendingPackedInt64Buffer

BlockPackedReader

BlockPackedReaderIterator

BlockPackedWriter

EliasFanoDecoder

EliasFanoDocIdSet

EliasFanoEncoder

GrowableWriter

MonotonicAppendingInt64Buffer

MonotonicBlockPackedReader

MonotonicBlockPackedWriter

Packed64

PackedDataInput

PackedDataOutput

PackedInt32s

PackedInt32s.Format

PackedInt32s.FormatAndBits

PackedInt32s.Header

PackedInt32s.Mutable

PackedInt32s.MutableImpl

PackedInt32s.NullReader

PackedInt32s.Reader

PackedInt32s.Writer

PagedGrowableWriter

PagedMutable

Interfaces

PackedInt32s.IDecoder

PackedInt32s.IEncoder

PackedInt32s.IReaderIterator