Namespace Lucene.Net.Util.Packed

Packed integer arrays and streams.

The packed package provides

sequential and random access capable arrays of positive longs,
routines for efficient serialization and deserialization of streams of packed integers.

The implementations provide different trade-offs between memory usage and access speed. The standard usage scenario is replacing large int or long arrays in order to reduce the memory footprint.

The main access point is the PackedInt32s factory.

In-memory structures

PackedInt32s.Mutable
- Only supports positive longs.
- Requires the number of bits per value to be known in advance.
- Random-access for both writing and reading.
GrowableWriter
- Same as PackedInts.Mutable but grows the number of bits per values when needed.
- Useful to build a PackedInts.Mutable from a read-once stream of longs.
PagedGrowableWriter
- Slices data into fixed-size blocks stored in GrowableWriters.
- Supports more than 2B values.
- You should use Appending(Delta)PackedLongBuffer instead if you don't need random write access.
AppendingDeltaPackedInt64Buffer
- Can store any sequence of longs.
- Compression is good when values are close to each other.
- Supports random reads, but only sequential writes.
- Can address up to 2^42 values.
AppendingPackedInt64Buffer
- Same as AppendingDeltaPackedInt64Buffer but assumes values are 0-based.
MonotonicAppendingInt64Buffer
- Same as AppendingDeltaPackedInt64Buffer except that compression is good when the stream is a succession of affine functions.

Disk-based structures

PackedInt32s.Writer, PackedInt32s.Reader, PackedInt32s.IReaderIterator
- Only supports positive longs.
- Requires the number of bits per value to be known in advance.
- Supports both fast sequential access with low memory footprint with ReaderIterator and random-access by either loading values in memory or leaving them on disk with Reader.
BlockPackedWriter, BlockPackedReader, BlockPackedReaderIterator
- Splits the stream into fixed-size blocks.
- Compression is good when values are close to each other.
- Can address up to 2B * blockSize values.
MonotonicBlockPackedWriter, MonotonicBlockPackedReader
- Same as the non-monotonic variants except that compression is good when the stream is a succession of affine functions.
- The reason why there is no sequential access is that if you need sequential access, you should rather delta-encode and use BlockPackedWriter.
PackedDataOutput, PackedDataInput
- Writes sequences of longs where each long can use any number of bits.

Classes

AbstractAppendingInt64Buffer

Common functionality shared by AppendingDeltaPackedInt64Buffer and MonotonicAppendingInt64Buffer.

NOTE: This was AbstractAppendingLongBuffer in Lucene

AbstractAppendingInt64Buffer.Iterator

AbstractPagedMutable<T>

Base implementation for PagedMutable and PagedGrowableWriter.

Note

This API is for internal purposes only and might change in incompatible ways in the next release.

AppendingDeltaPackedInt64Buffer

Utility class to buffer a list of signed longs in memory. This class only supports appending and is optimized for the case where values are close to each other.

NOTE: This was AppendingDeltaPackedLongBuffer in Lucene

Note

This API is for internal purposes only and might change in incompatible ways in the next release.

AppendingPackedInt64Buffer

Utility class to buffer a list of signed longs in memory. This class only supports appending and is optimized for non-negative numbers with a uniform distribution over a fixed (limited) range.

NOTE: This was AppendingPackedLongBuffer in Lucene

Note

This API is for internal purposes only and might change in incompatible ways in the next release.

BlockPackedReader

Provides random access to a stream written with BlockPackedWriter.

Note

This API is for internal purposes only and might change in incompatible ways in the next release.

BlockPackedReaderIterator

Reader for sequences of System.Int64s written with BlockPackedWriter.

Note

This API is for internal purposes only and might change in incompatible ways in the next release.

The sequence is divided into fixed-size blocks and for each block, the difference between each value and the minimum value of the block is encoded using as few bits as possible. Memory usage of this class is proportional to the block size. Each block has an overhead between 1 and 10 bytes to store the minimum value and the number of bits per value of the block.

Format:

<BLock>^BlockCount
BlockCount: ⌈ ValueCount / BlockSize ⌉
Block: <Header, (Ints)>
Header: <Token, (MinValue)>
Token: a byte (WriteByte(Byte)), first 7 bits are the number of bits per value (bitsPerValue). If the 8th bit is 1, then MinValue (see next) is 0, otherwise MinValue and needs to be decoded
MinValue: a zigzag-encoded variable-length System.Int64 (WriteVInt64(Int64)) whose value should be added to every int from the block to restore the original values
Ints: If the number of bits per value is 0, then there is nothing to decode and all ints are equal to MinValue. Otherwise: BlockSize packed ints (PackedInt32s) encoded on exactly bitsPerValue bits per value. They are the subtraction of the original values and MinValue

Note

This API is for internal purposes only and might change in incompatible ways in the next release.

EliasFanoDecoder

A decoder for an EliasFanoEncoder.

Note

This API is for internal purposes only and might change in incompatible ways in the next release.

EliasFanoDocIdSet

A DocIdSet in Elias-Fano encoding.

Note

This API is for internal purposes only and might change in incompatible ways in the next release.

EliasFanoEncoder

Encode a non decreasing sequence of non negative whole numbers in the Elias-Fano encoding that was introduced in the 1970's by Peter Elias and Robert Fano.

The Elias-Fano encoding is a high bits / low bits representation of a monotonically increasing sequence of numValues > 0 natural numbers x[i]

0 <= x[0] <= x[1] <= ... <= x[numValues-2] <= x[numValues-1] <= upperBound

where upperBound > 0 is an upper bound on the last value.

The Elias-Fano encoding uses less than half a bit per encoded number more than the smallest representation that can encode any monotone sequence with the same bounds.

The lower L bits of each x[i] are stored explicitly and contiguously in the lower-bits array, with L chosen as (Log() base 2):

L = max(0, floor(log(upperBound/numValues)))

The upper bits are stored in the upper-bits array as a sequence of unary-coded gaps (x[-1] = 0):

(x[i]/2L) - (x[i-1]/2L)

The unary code encodes a natural number n by n 0 bits followed by a 1 bit: 0...01.

In the upper bits the total the number of 1 bits is numValues and the total number of 0 bits is:

floor(x[numValues-1]/2L) <= upperBound/(2max(0, floor(log(upperBound/numValues)))) <= 2numValues

The Elias-Fano encoding uses at most

2 + Ceil(Log(upperBound/numValues))

bits per encoded number. With upperBound in these bounds (p is an integer):

2p < x[numValues-1] <= upperBound <= 2*(p+1)

the number of bits per encoded number is minimized.

In this implementation the values in the sequence can be given as long, numValues = 0 and upperBound = 0 are allowed, and each of the upper and lower bit arrays should fit in a long[].

An index of positions of zero's in the upper bits is also built.

this implementation is based on this article:

Sebastiano Vigna, "Quasi Succinct Indices", June 19, 2012, sections 3, 4 and 9. Retrieved from http://arxiv.org/pdf/1206.4300 .

The articles originally describing the Elias-Fano representation are:

Peter Elias, "Efficient storage and retrieval by content and address of static files", J. Assoc. Comput. Mach., 21(2):246â€"260, 1974.

Robert M. Fano, "On the number of bits required to implement an associative memory", Memorandum 61, Computer Structures Group, Project MAC, MIT, Cambridge, Mass., 1971.

Note

This API is for internal purposes only and might change in incompatible ways in the next release.

GrowableWriter

Implements PackedInt32s.Mutable, but grows the bit count of the underlying packed ints on-demand.

Beware that this class will accept to set negative values but in order to do this, it will grow the number of bits per value to 64.

Note

This API is for internal purposes only and might change in incompatible ways in the next release.

MonotonicAppendingInt64Buffer

Utility class to buffer signed longs in memory, which is optimized for the case where the sequence is monotonic, although it can encode any sequence of arbitrary longs. It only supports appending.

NOTE: This was MonotonicAppendingLongBuffer in Lucene.

Note

This API is for internal purposes only and might change in incompatible ways in the next release.

MonotonicBlockPackedReader

Provides random access to a stream written with MonotonicBlockPackedWriter.

Note

This API is for internal purposes only and might change in incompatible ways in the next release.

MonotonicBlockPackedWriter

A writer for large monotonically increasing sequences of positive System.Int64s.

The sequence is divided into fixed-size blocks and for each block, values are modeled after a linear function f: x → A × x + B. The block encodes deltas from the expected values computed from this function using as few bits as possible. Each block has an overhead between 6 and 14 bytes.

Format:

<BLock>^BlockCount
BlockCount: ⌈ ValueCount / BlockSize ⌉
Block: <Header, (Ints)>
Header: <B, A, BitsPerValue>
B: the B from f: x → A × x + B using a variable-length System.Int64 (WriteVInt64(Int64))
A: the A from f: x → A × x + B encoded using J2N.BitConversion.SingleToInt32Bits(System.Single) on 4 bytes (WriteVInt32(Int32))
BitsPerValue: a variable-length System.Int32 (WriteVInt32(Int32))
Ints: if BitsPerValue is 0, then there is nothing to read and all values perfectly match the result of the function. Otherwise, these are the zigzag-encoded packed (PackedInt32s) deltas from the expected value (computed from the function) using exaclty BitsPerValue bits per value

Note

This API is for internal purposes only and might change in incompatible ways in the next release.

Packed64

Space optimized random access capable array of values with a fixed number of bits/value. Values are packed contiguously.

The implementation strives to perform af fast as possible under the constraint of contiguous bits, by avoiding expensive operations. This comes at the cost of code clarity.

Technical details: this implementation is a refinement of a non-branching version. The non-branching get and set methods meant that 2 or 4 atomics in the underlying array were always accessed, even for the cases where only 1 or 2 were needed. Even with caching, this had a detrimental effect on performance. Related to this issue, the old implementation used lookup tables for shifts and masks, which also proved to be a bit slower than calculating the shifts and masks on the fly. See https://issues.apache.org/jira/browse/LUCENE-4062 for details.

PackedDataInput

A DataInput wrapper to read unaligned, variable-length packed integers. This API is much slower than the PackedInt32s fixed-length API but can be convenient to save space.

Note

This API is for internal purposes only and might change in incompatible ways in the next release.

PackedDataOutput

A DataOutput wrapper to write unaligned, variable-length packed integers.

Note

This API is for internal purposes only and might change in incompatible ways in the next release.

PackedInt32s

Simplistic compression for array of unsigned long values. Each value is >= 0 and <= a specified maximum value. The values are stored as packed ints, with each value consuming a fixed number of bits.

NOTE: This was PackedInts in Lucene.

Note

This API is for internal purposes only and might change in incompatible ways in the next release.

PackedInt32s.Format

A format to write packed System.Int32s.

Note

This API is for internal purposes only and might change in incompatible ways in the next release.

PackedInt32s.FormatAndBits

Simple class that holds a format and a number of bits per value.

PackedInt32s.Header

Header identifying the structure of a packed integer array.

PackedInt32s.Mutable

A packed integer array that can be modified.

Note

This API is for internal purposes only and might change in incompatible ways in the next release.

PackedInt32s.MutableImpl

PackedInt32s.NullReader

A PackedInt32s.Reader which has all its values equal to 0 (bitsPerValue = 0).

PackedInt32s.Reader

A read-only random access array of positive integers.

Note

This API is for internal purposes only and might change in incompatible ways in the next release.

PackedInt32s.Writer

A write-once Writer.

Note

This API is for internal purposes only and might change in incompatible ways in the next release.

PagedGrowableWriter

A PagedGrowableWriter. This class slices data into fixed-size blocks which have independent numbers of bits per value and grow on-demand.

You should use this class instead of the AbstractAppendingInt64Buffer related ones only when you need random write-access. Otherwise this class will likely be slower and less memory-efficient.

Note

This API is for internal purposes only and might change in incompatible ways in the next release.

PagedMutable

A PagedMutable. This class slices data into fixed-size blocks which have the same number of bits per value. It can be a useful replacement for PackedInt32s.Mutable to store more than 2B values.

Note

This API is for internal purposes only and might change in incompatible ways in the next release.

Namespace Lucene.Net.Util.Packed

Packed integer arrays and streams.

In-memory structures

Disk-based structures

Classes

Note

Note

Note

Note

Note

Note

Note

Note

Note

Note

Note

Note

Note

Note

Note

Note

Note

Note

Note

Note

Note

Note

Interfaces