• API

    Show / Hide Table of Contents

    Class EliasFanoEncoder

    Encode a non decreasing sequence of non negative whole numbers in the Elias-Fano encoding that was introduced in the 1970's by Peter Elias and Robert Fano.

    The Elias-Fano encoding is a high bits / low bits representation of a monotonically increasing sequence of numValues > 0 natural numbers x[i]

    0 <= x[0] <= x[1] <= ... <= x[numValues-2] <= x[numValues-1] <= upperBound

    where upperBound > 0 is an upper bound on the last value.

    The Elias-Fano encoding uses less than half a bit per encoded number more than the smallest representation that can encode any monotone sequence with the same bounds.

    The lower L bits of each x[i] are stored explicitly and contiguously in the lower-bits array, with L chosen as (Log() base 2):

    L = max(0, floor(log(upperBound/numValues)))

    The upper bits are stored in the upper-bits array as a sequence of unary-coded gaps (x[-1] = 0):

    (x[i]/2L) - (x[i-1]/2L)

    The unary code encodes a natural number n by n 0 bits followed by a 1 bit: 0...01.

    In the upper bits the total the number of 1 bits is numValues and the total number of 0 bits is:

    floor(x[numValues-1]/2L) <= upperBound/(2max(0, floor(log(upperBound/numValues)))) <= 2numValues

    The Elias-Fano encoding uses at most

    2 + Ceil(Log(upperBound/numValues))

    bits per encoded number. With upperBound in these bounds (p is an integer):

    2p < x[numValues-1] <= upperBound <= 2*(p+1)

    the number of bits per encoded number is minimized.

    In this implementation the values in the sequence can be given as long, numValues = 0 and upperBound = 0 are allowed, and each of the upper and lower bit arrays should fit in a long[].

    An index of positions of zero's in the upper bits is also built.

    this implementation is based on this article:

    Sebastiano Vigna, "Quasi Succinct Indices", June 19, 2012, sections 3, 4 and 9. Retrieved from http://arxiv.org/pdf/1206.4300 .

    The articles originally describing the Elias-Fano representation are:

    Peter Elias, "Efficient storage and retrieval by content and address of static files", J. Assoc. Comput. Mach., 21(2):246â€"260, 1974.

    Robert M. Fano, "On the number of bits required to implement an associative memory", Memorandum 61, Computer Structures Group, Project MAC, MIT, Cambridge, Mass., 1971.

    This is a Lucene.NET INTERNAL API, use at your own risk
    Inheritance
    System.Object
    EliasFanoEncoder
    Inherited Members
    System.Object.Equals(System.Object, System.Object)
    System.Object.GetType()
    System.Object.MemberwiseClone()
    System.Object.ReferenceEquals(System.Object, System.Object)
    Namespace: Lucene.Net.Util.Packed
    Assembly: Lucene.Net.dll
    Syntax
    public class EliasFanoEncoder

    Constructors

    | Improve this Doc View Source

    EliasFanoEncoder(Int64, Int64)

    Construct an Elias-Fano encoder using DEFAULT_INDEX_INTERVAL.

    Declaration
    public EliasFanoEncoder(long numValues, long upperBound)
    Parameters
    Type Name Description
    System.Int64 numValues
    System.Int64 upperBound
    | Improve this Doc View Source

    EliasFanoEncoder(Int64, Int64, Int64)

    Construct an Elias-Fano encoder. After construction, call EncodeNext(Int64) numValues times to encode a non decreasing sequence of non negative numbers.

    Declaration
    public EliasFanoEncoder(long numValues, long upperBound, long indexInterval)
    Parameters
    Type Name Description
    System.Int64 numValues

    The number of values that is to be encoded.

    System.Int64 upperBound

    At least the highest value that will be encoded. For space efficiency this should not exceed the power of two that equals or is the first higher than the actual maximum.

    When numValues >= (upperBound/3) a FixedBitSet will take less space.

    System.Int64 indexInterval

    The number of high zero bits for which a single index entry is built. The index will have at most 2 * numValues / indexInterval entries and each index entry will use at most Ceil(Log2(3 * numValues)) bits, see EliasFanoEncoder.

    Exceptions
    Type Condition
    System.ArgumentException

    when:

    • numValues is negative, or
    • numValues is non negative and upperBound is negative, or
    • the low bits do not fit in a long[]: (L * numValues / 64) > System.Int32.MaxValue, or
    • the high bits do not fit in a long[]: (2 * numValues / 64) > System.Int32.MaxValue, or
    • indexInterval < 2,
    • the index bits do not fit in a long[]: (numValues / indexInterval * ceil(2log(3 * numValues)) / 64) > System.Int32.MaxValue.

    Fields

    | Improve this Doc View Source

    DEFAULT_INDEX_INTERVAL

    The default index interval for zero upper bits.

    Declaration
    public const long DEFAULT_INDEX_INTERVAL = 256L
    Field Value
    Type Description
    System.Int64

    Properties

    | Improve this Doc View Source

    IndexBits

    Expert. The index bits.

    Declaration
    public virtual long[] IndexBits { get; }
    Property Value
    Type Description
    System.Int64[]
    | Improve this Doc View Source

    LowerBits

    Expert. The low bits.

    Declaration
    public virtual long[] LowerBits { get; }
    Property Value
    Type Description
    System.Int64[]
    | Improve this Doc View Source

    UpperBits

    Expert. The high bits.

    Declaration
    public virtual long[] UpperBits { get; }
    Property Value
    Type Description
    System.Int64[]

    Methods

    | Improve this Doc View Source

    EncodeNext(Int64)

    Call at most Lucene.Net.Util.Packed.EliasFanoEncoder.numValues times to encode a non decreasing sequence of non negative numbers.

    Declaration
    public virtual void EncodeNext(long x)
    Parameters
    Type Name Description
    System.Int64 x

    The next number to be encoded.

    Exceptions
    Type Condition
    System.InvalidOperationException

    when called more than Lucene.Net.Util.Packed.EliasFanoEncoder.numValues times.

    System.ArgumentException

    when:

    • x is smaller than an earlier encoded value, or
    • x is larger than Lucene.Net.Util.Packed.EliasFanoEncoder.upperBound.

    | Improve this Doc View Source

    Equals(Object)

    Declaration
    public override bool Equals(object other)
    Parameters
    Type Name Description
    System.Object other
    Returns
    Type Description
    System.Boolean
    Overrides
    System.Object.Equals(System.Object)
    | Improve this Doc View Source

    GetDecoder()

    Returns an EliasFanoDecoder to access the encoded values. Perform all calls to EncodeNext(Int64) before calling GetDecoder().

    Declaration
    public virtual EliasFanoDecoder GetDecoder()
    Returns
    Type Description
    EliasFanoDecoder
    | Improve this Doc View Source

    GetHashCode()

    Declaration
    public override int GetHashCode()
    Returns
    Type Description
    System.Int32
    Overrides
    System.Object.GetHashCode()
    | Improve this Doc View Source

    SufficientlySmallerThanBitSet(Int64, Int64)

    Provide an indication that it is better to use an EliasFanoEncoder than a FixedBitSet to encode document identifiers. This indication is not precise and may change in the future.

    An EliasFanoEncoder is favored when the size of the encoding by the EliasFanoEncoder (including some space for its index) is at most about 5/6 of the size of the FixedBitSet, this is the same as comparing estimates of the number of bits accessed by a pair of FixedBitSets and by a pair of non indexed EliasFanoDocIdSets when determining the intersections of the pairs.

    A bit set is preferred when upperbound <= 256.

    It is assumed that DEFAULT_INDEX_INTERVAL is used.

    Declaration
    public static bool SufficientlySmallerThanBitSet(long numValues, long upperBound)
    Parameters
    Type Name Description
    System.Int64 numValues

    The number of document identifiers that is to be encoded. Should be non negative.

    System.Int64 upperBound

    The maximum possible value for a document identifier. Should be at least numValues.

    Returns
    Type Description
    System.Boolean
    | Improve this Doc View Source

    ToString()

    Declaration
    public override string ToString()
    Returns
    Type Description
    System.String
    Overrides
    System.Object.ToString()
    • Improve this Doc
    • View Source
    Back to top Copyright © 2020 Licensed to the Apache Software Foundation (ASF)