Fork me on GitHub
  • API

    Show / Hide Table of Contents

    Class NumericUtils

    This is a helper class to generate prefix-encoded representations for numerical values and supplies converters to represent float/double values as sortable integers/longs.

    To quickly execute range queries in Apache Lucene, a range is divided recursively into multiple intervals for searching: The center of the range is searched only with the lowest possible precision in the trie, while the boundaries are matched more exactly. this reduces the number of terms dramatically.

    This class generates terms to achieve this: First the numerical integer values need to be converted to bytes. For that integer values (32 bit or 64 bit) are made unsigned and the bits are converted to ASCII chars with each 7 bit. The resulting byte[] is sortable like the original integer value (even using UTF-8 sort order). Each value is also prefixed (in the first char) by the shift value (number of bits removed) used during encoding.

    To also index floating point numbers, this class supplies two methods to convert them to integer values by changing their bit layout: DoubleToSortableInt64(double), SingleToSortableInt32(float). You will have no precision loss by converting floating point numbers to integers and back (only that the integer form is not usable). Other data types like dates can easily converted to longs or ints (e.g. date to long: Ticks).

    For easy usage, the trie algorithm is implemented for indexing inside NumericTokenStream that can index int, long, float, and double. For querying, NumericRangeQuery and NumericRangeFilter implement the query part for the same data types.

    This class can also be used, to generate lexicographically sortable (according to UTF8SortedAsUTF16Comparer) representations of numeric data types for other usages (e.g. sorting).

    Note

    This API is for internal purposes only and might change in incompatible ways in the next release.

    @since 2.9, API changed non backwards-compliant in 4.0
    Inheritance
    object
    NumericUtils
    Inherited Members
    object.Equals(object)
    object.Equals(object, object)
    object.GetHashCode()
    object.GetType()
    object.MemberwiseClone()
    object.ReferenceEquals(object, object)
    object.ToString()
    Namespace: Lucene.Net.Util
    Assembly: Lucene.Net.dll
    Syntax
    public static class NumericUtils

    Fields

    BUF_SIZE_INT32

    The maximum term length (used for byte[] buffer size) for encoding int values.

    NOTE: This was BUF_SIZE_INT in Lucene
    Declaration
    public const int BUF_SIZE_INT32 = 6
    Field Value
    Type Description
    int
    See Also
    Int32ToPrefixCodedBytes(int, int, BytesRef)

    BUF_SIZE_INT64

    The maximum term length (used for byte[] buffer size) for encoding long values.

    NOTE: This was BUF_SIZE_LONG in Lucene
    Declaration
    public const int BUF_SIZE_INT64 = 11
    Field Value
    Type Description
    int
    See Also
    Int64ToPrefixCodedBytes(long, int, BytesRef)

    PRECISION_STEP_DEFAULT

    The default precision step used by Int32Field, SingleField, Int64Field, DoubleField, NumericTokenStream, NumericRangeQuery, and NumericRangeFilter.

    Declaration
    public const int PRECISION_STEP_DEFAULT = 4
    Field Value
    Type Description
    int

    SHIFT_START_INT32

    Integers are stored at lower precision by shifting off lower bits. The shift count is stored as SHIFT_START_INT32+shift in the first byte

    NOTE: This was SHIFT_START_INT in Lucene
    Declaration
    public const byte SHIFT_START_INT32 = 96
    Field Value
    Type Description
    byte

    SHIFT_START_INT64

    Longs are stored at lower precision by shifting off lower bits. The shift count is stored as SHIFT_START_INT64+shift in the first byte

    NOTE: This was SHIFT_START_LONG in Lucene
    Declaration
    public const char SHIFT_START_INT64 = ' '
    Field Value
    Type Description
    char

    Methods

    DoubleToSortableInt64(double)

    Converts a double value to a sortable signed long. The value is converted by getting their IEEE 754 floating-point "double format" bit layout and then some bits are swapped, to be able to compare the result as long. By this the precision is not reduced, but the value can easily used as a long. The sort order (including NaN) is defined by CompareTo(double); NaN is greater than positive infinity.

    NOTE: This was doubleToSortableLong() in Lucene
    Declaration
    public static long DoubleToSortableInt64(double val)
    Parameters
    Type Name Description
    double val
    Returns
    Type Description
    long
    See Also
    SortableInt64ToDouble(long)

    FilterPrefixCodedInt32s(TermsEnum)

    Filters the given TermsEnum by accepting only prefix coded 32 bit terms with a shift value of 0.

    NOTE: This was filterPrefixCodedInts() in Lucene
    Declaration
    public static TermsEnum FilterPrefixCodedInt32s(TermsEnum termsEnum)
    Parameters
    Type Name Description
    TermsEnum termsEnum

    The terms enum to filter

    Returns
    Type Description
    TermsEnum

    A filtered TermsEnum that only returns prefix coded 32 bit terms with a shift value of 0.

    FilterPrefixCodedInt64s(TermsEnum)

    Filters the given TermsEnum by accepting only prefix coded 64 bit terms with a shift value of 0.

    NOTE: This was filterPrefixCodedLongs() in Lucene
    Declaration
    public static TermsEnum FilterPrefixCodedInt64s(TermsEnum termsEnum)
    Parameters
    Type Name Description
    TermsEnum termsEnum

    The terms enum to filter

    Returns
    Type Description
    TermsEnum

    A filtered TermsEnum that only returns prefix coded 64 bit terms with a shift value of 0.

    GetPrefixCodedInt32Shift(BytesRef)

    Returns the shift value from a prefix encoded int.

    NOTE: This was getPrefixCodedIntShift() in Lucene
    Declaration
    public static int GetPrefixCodedInt32Shift(BytesRef val)
    Parameters
    Type Name Description
    BytesRef val
    Returns
    Type Description
    int
    Exceptions
    Type Condition
    FormatException

    if the supplied BytesRef is not correctly prefix encoded.

    GetPrefixCodedInt64Shift(BytesRef)

    Returns the shift value from a prefix encoded long.

    NOTE: This was getPrefixCodedLongShift() in Lucene
    Declaration
    public static int GetPrefixCodedInt64Shift(BytesRef val)
    Parameters
    Type Name Description
    BytesRef val
    Returns
    Type Description
    int
    Exceptions
    Type Condition
    FormatException

    if the supplied BytesRef is not correctly prefix encoded.

    Int32ToPrefixCoded(int, int, BytesRef)

    Returns prefix coded bits after reducing the precision by shift bits. This is method is used by NumericTokenStream. After encoding, bytes.Offset will always be 0.

    NOTE: This was intToPrefixCoded() in Lucene
    Declaration
    public static void Int32ToPrefixCoded(int val, int shift, BytesRef bytes)
    Parameters
    Type Name Description
    int val

    The numeric value

    int shift

    How many bits to strip from the right

    BytesRef bytes

    Will contain the encoded value

    Int32ToPrefixCodedBytes(int, int, BytesRef)

    Returns prefix coded bits after reducing the precision by shift bits. This is method is used by NumericTokenStream. After encoding, bytes.Offset will always be 0.

    NOTE: This was intToPrefixCodedBytes() in Lucene
    Declaration
    public static void Int32ToPrefixCodedBytes(int val, int shift, BytesRef bytes)
    Parameters
    Type Name Description
    int val

    The numeric value

    int shift

    How many bits to strip from the right

    BytesRef bytes

    Will contain the encoded value

    Int64ToPrefixCoded(long, int, BytesRef)

    Returns prefix coded bits after reducing the precision by shift bits. This is method is used by NumericTokenStream. After encoding, bytes.Offset will always be 0.

    NOTE: This was longToPrefixCoded() in Lucene
    Declaration
    public static void Int64ToPrefixCoded(long val, int shift, BytesRef bytes)
    Parameters
    Type Name Description
    long val

    The numeric value

    int shift

    How many bits to strip from the right

    BytesRef bytes

    Will contain the encoded value

    Int64ToPrefixCodedBytes(long, int, BytesRef)

    Returns prefix coded bits after reducing the precision by shift bits. This is method is used by NumericTokenStream. After encoding, bytes.Offset will always be 0.

    NOTE: This was longToPrefixCodedBytes() in Lucene
    Declaration
    public static void Int64ToPrefixCodedBytes(long val, int shift, BytesRef bytes)
    Parameters
    Type Name Description
    long val

    The numeric value

    int shift

    How many bits to strip from the right

    BytesRef bytes

    Will contain the encoded value

    PrefixCodedToInt32(BytesRef)

    Returns an int from prefixCoded bytes. Rightmost bits will be zero for lower precision codes. This method can be used to decode a term's value.

    NOTE: This was prefixCodedToInt() in Lucene
    Declaration
    public static int PrefixCodedToInt32(BytesRef val)
    Parameters
    Type Name Description
    BytesRef val
    Returns
    Type Description
    int
    Exceptions
    Type Condition
    FormatException

    if the supplied BytesRef is not correctly prefix encoded.

    See Also
    Int32ToPrefixCodedBytes(int, int, BytesRef)

    PrefixCodedToInt64(BytesRef)

    Returns a long from prefixCoded bytes. Rightmost bits will be zero for lower precision codes. This method can be used to decode a term's value.

    NOTE: This was prefixCodedToLong() in Lucene
    Declaration
    public static long PrefixCodedToInt64(BytesRef val)
    Parameters
    Type Name Description
    BytesRef val
    Returns
    Type Description
    long
    Exceptions
    Type Condition
    FormatException

    if the supplied BytesRef is not correctly prefix encoded.

    See Also
    Int64ToPrefixCodedBytes(long, int, BytesRef)

    SingleToSortableInt32(float)

    Converts a float value to a sortable signed int. The value is converted by getting their IEEE 754 floating-point "float format" bit layout and then some bits are swapped, to be able to compare the result as int. By this the precision is not reduced, but the value can easily used as an int. The sort order (including NaN) is defined by CompareTo(float); NaN is greater than positive infinity.

    NOTE: This was floatToSortableInt() in Lucene
    Declaration
    public static int SingleToSortableInt32(float val)
    Parameters
    Type Name Description
    float val
    Returns
    Type Description
    int
    See Also
    SortableInt32ToSingle(int)

    SortableInt32ToSingle(int)

    Converts a sortable int back to a float.

    NOTE: This was sortableIntToFloat() in Lucene
    Declaration
    public static float SortableInt32ToSingle(int val)
    Parameters
    Type Name Description
    int val
    Returns
    Type Description
    float
    See Also
    SingleToSortableInt32(float)

    SortableInt64ToDouble(long)

    Converts a sortable long back to a double.

    NOTE: This was sortableLongToDouble() in Lucene
    Declaration
    public static double SortableInt64ToDouble(long val)
    Parameters
    Type Name Description
    long val
    Returns
    Type Description
    double
    See Also
    DoubleToSortableInt64(double)

    SplitInt32Range(Int32RangeBuilder, int, int, int)

    Splits an int range recursively. You may implement a builder that adds clauses to a BooleanQuery for each call to its AddRange(BytesRef, BytesRef) method.

    This method is used by NumericRangeQuery.

    NOTE: This was splitIntRange() in Lucene
    Declaration
    public static void SplitInt32Range(NumericUtils.Int32RangeBuilder builder, int precisionStep, int minBound, int maxBound)
    Parameters
    Type Name Description
    NumericUtils.Int32RangeBuilder builder
    int precisionStep
    int minBound
    int maxBound

    SplitInt64Range(Int64RangeBuilder, int, long, long)

    Splits a long range recursively. You may implement a builder that adds clauses to a BooleanQuery for each call to its AddRange(BytesRef, BytesRef) method.

    This method is used by NumericRangeQuery.

    NOTE: This was splitLongRange() in Lucene
    Declaration
    public static void SplitInt64Range(NumericUtils.Int64RangeBuilder builder, int precisionStep, long minBound, long maxBound)
    Parameters
    Type Name Description
    NumericUtils.Int64RangeBuilder builder
    int precisionStep
    long minBound
    long maxBound
    Back to top Copyright © 2024 The Apache Software Foundation, Licensed under the Apache License, Version 2.0
    Apache Lucene.Net, Lucene.Net, Apache, the Apache feather logo, and the Apache Lucene.Net project logo are trademarks of The Apache Software Foundation.
    All other marks mentioned may be trademarks or registered trademarks of their respective owners.