Class EliasFanoEncoder
Encode a non decreasing sequence of non negative whole numbers in the Elias-Fano encoding that was introduced in the 1970's by Peter Elias and Robert Fano.
The Elias-Fano encoding is a high bits / low bits representation of a monotonically increasing sequence ofnumValues > 0
natural numbers x[i]
0 <= x[0] <= x[1] <= ... <= x[numValues-2] <= x[numValues-1] <= upperBound
where upperBound > 0
is an upper bound on the last value.
The Elias-Fano encoding uses less than half a bit per encoded number more
than the smallest representation
that can encode any monotone sequence with the same bounds.
The lower L
bits of each x[i]
are stored explicitly and contiguously
in the lower-bits array, with L
chosen as (Log()
base 2):
L = max(0, floor(log(upperBound/numValues)))
The upper bits are stored in the upper-bits array as a sequence of unary-coded gaps (x[-1] = 0
):
(x[i]/2**L) - (x[i-1]/2**L)
The unary code encodes a natural number n
by n
0 bits followed by a 1 bit:
0...01
.
In the upper bits the total the number of 1 bits is numValues
and the total number of 0 bits is:
floor(x[numValues-1]/2**L) <= upperBound/(2**max(0, floor(log(upperBound/numValues)))) <= 2*numValues
The Elias-Fano encoding uses at most
2 + Ceil(Log(upperBound/numValues))
bits per encoded number. With upperBound
in these bounds (p
is an integer):
2**p < x[numValues-1] <= upperBound <= 2**(p+1)
the number of bits per encoded number is minimized.
In this implementation the values in the sequence can be given as long
,
numValues = 0
and upperBound = 0
are allowed,
and each of the upper and lower bit arrays should fit in a long[]
.
An index of positions of zero's in the upper bits is also built.
this implementation is based on this article:
Sebastiano Vigna, "Quasi Succinct Indices", June 19, 2012, sections 3, 4 and 9.
Retrieved from http://arxiv.org/pdf/1206.4300 .
The articles originally describing the Elias-Fano representation are:
Peter Elias, "Efficient storage and retrieval by content and address of static files",
J. Assoc. Comput. Mach., 21(2):246â€"260, 1974.
Robert M. Fano, "On the number of bits required to implement an associative memory",
Memorandum 61, Computer Structures Group, Project MAC, MIT, Cambridge, Mass., 1971.
Note
This API is for internal purposes only and might change in incompatible ways in the next release.
Inherited Members
Namespace: Lucene.Net.Util.Packed
Assembly: Lucene.Net.dll
Syntax
public class EliasFanoEncoder
Constructors
EliasFanoEncoder(long, long)
Construct an Elias-Fano encoder using DEFAULT_INDEX_INTERVAL.
Declaration
public EliasFanoEncoder(long numValues, long upperBound)
Parameters
Type | Name | Description |
---|---|---|
long | numValues | |
long | upperBound |
EliasFanoEncoder(long, long, long)
Construct an Elias-Fano encoder.
After construction, call EncodeNext(long)numValues
times to encode
a non decreasing sequence of non negative numbers.
Declaration
public EliasFanoEncoder(long numValues, long upperBound, long indexInterval)
Parameters
Type | Name | Description |
---|---|---|
long | numValues | The number of values that is to be encoded. |
long | upperBound | At least the highest value that will be encoded. For space efficiency this should not exceed the power of two that equals or is the first higher than the actual maximum. WhennumValues >= (upperBound/3)
a FixedBitSet will take less space.
|
long | indexInterval | The number of high zero bits for which a single index entry is built.
The index will have at most |
Exceptions
Type | Condition |
---|---|
ArgumentException | when:
|
Fields
DEFAULT_INDEX_INTERVAL
The default index interval for zero upper bits.
Declaration
public const long DEFAULT_INDEX_INTERVAL = 256
Field Value
Type | Description |
---|---|
long |
Properties
IndexBits
Expert. The index bits.
Declaration
public virtual long[] IndexBits { get; }
Property Value
Type | Description |
---|---|
long[] |
LowerBits
Expert. The low bits.
Declaration
public virtual long[] LowerBits { get; }
Property Value
Type | Description |
---|---|
long[] |
UpperBits
Expert. The high bits.
Declaration
public virtual long[] UpperBits { get; }
Property Value
Type | Description |
---|---|
long[] |
Methods
EncodeNext(long)
Call at most numValues times to encode a non decreasing sequence of non negative numbers.
Declaration
public virtual void EncodeNext(long x)
Parameters
Type | Name | Description |
---|---|---|
long | x | The next number to be encoded. |
Exceptions
Type | Condition |
---|---|
InvalidOperationException | when called more than numValues times. |
ArgumentOutOfRangeException | when:
|
Equals(object)
Determines whether the specified object is equal to the current object.
Declaration
public override bool Equals(object other)
Parameters
Type | Name | Description |
---|---|---|
object | other |
Returns
Type | Description |
---|---|
bool | true if the specified object is equal to the current object; otherwise, false. |
Overrides
GetDecoder()
Returns an EliasFanoDecoder to access the encoded values. Perform all calls to EncodeNext(long) before calling GetDecoder().
Declaration
public virtual EliasFanoDecoder GetDecoder()
Returns
Type | Description |
---|---|
EliasFanoDecoder |
GetHashCode()
Serves as the default hash function.
Declaration
public override int GetHashCode()
Returns
Type | Description |
---|---|
int | A hash code for the current object. |
Overrides
SufficientlySmallerThanBitSet(long, long)
Provide an indication that it is better to use an EliasFanoEncoder than a FixedBitSet to encode document identifiers. This indication is not precise and may change in the future.
An EliasFanoEncoder is favored when the size of the encoding by the EliasFanoEncoder (including some space for its index) is at most about 5/6 of the size of the FixedBitSet, this is the same as comparing estimates of the number of bits accessed by a pair of FixedBitSets and by a pair of non indexed EliasFanoDocIdSets when determining the intersections of the pairs. A bit set is preferred whenupperbound <= 256
.
It is assumed that DEFAULT_INDEX_INTERVAL is used.
Declaration
public static bool SufficientlySmallerThanBitSet(long numValues, long upperBound)
Parameters
Type | Name | Description |
---|---|---|
long | numValues | The number of document identifiers that is to be encoded. Should be non negative. |
long | upperBound | The maximum possible value for a document identifier. Should be at least |
Returns
Type | Description |
---|---|
bool |
ToString()
Returns a string that represents the current object.
Declaration
public override string ToString()
Returns
Type | Description |
---|---|
string | A string that represents the current object. |