Class NumericUtils

This is a helper class to generate prefix-encoded representations for numerical values and supplies converters to represent float/double values as sortable integers/longs.

To quickly execute range queries in Apache Lucene, a range is divided recursively into multiple intervals for searching: The center of the range is searched only with the lowest possible precision in the trie, while the boundaries are matched more exactly. this reduces the number of terms dramatically.

This class generates terms to achieve this: First the numerical integer values need to be converted to bytes. For that integer values (32 bit or 64 bit) are made unsigned and the bits are converted to ASCII chars with each 7 bit. The resulting byte[] is sortable like the original integer value (even using UTF-8 sort order). Each value is also prefixed (in the first char) by the shift value (number of bits removed) used during encoding.

To also index floating point numbers, this class supplies two methods to convert them to integer values by changing their bit layout: DoubleToSortableInt64(Double), SingleToSortableInt32(Single). You will have no precision loss by converting floating point numbers to integers and back (only that the integer form is not usable). Other data types like dates can easily converted to System.Int64s or System.Int32s (e.g. date to long: System.DateTime.Ticks).

For easy usage, the trie algorithm is implemented for indexing inside NumericTokenStream that can index System.Int32, System.Int64, System.Single, and System.Double. For querying, NumericRangeQuery and NumericRangeFilter implement the query part for the same data types.

This class can also be used, to generate lexicographically sortable (according to UTF8SortedAsUTF16Comparer) representations of numeric data types for other usages (e.g. sorting).

Note

This API is for internal purposes only and might change in incompatible ways in the next release.

@since 2.9, API changed non backwards-compliant in 4.0

Inheritance

System.Object

NumericUtils

Inherited Members

System.Object.Equals(System.Object)

System.Object.Equals(System.Object, System.Object)

System.Object.GetHashCode()

System.Object.GetType()

System.Object.MemberwiseClone()

System.Object.ReferenceEquals(System.Object, System.Object)

System.Object.ToString()

Namespace: Lucene.Net.Util

Assembly: Lucene.Net.dll

Syntax

public static class NumericUtils

Fields

| Improve this Doc View Source

BUF_SIZE_INT32

The maximum term length (used for byte[] buffer size) for encoding System.Int32 values.

NOTE: This was BUF_SIZE_INT in Lucene

Declaration

public const int BUF_SIZE_INT32 = 6

Field Value

Type	Description
System.Int32

BUF_SIZE_INT64

The maximum term length (used for byte[] buffer size) for encoding System.Int64 values.

NOTE: This was BUF_SIZE_LONG in Lucene

Declaration

public const int BUF_SIZE_INT64 = 11

Field Value

Type	Description
System.Int32

PRECISION_STEP_DEFAULT

The default precision step used by Int32Field, SingleField, Int64Field, DoubleField, NumericTokenStream, NumericRangeQuery, and NumericRangeFilter.

Declaration

public const int PRECISION_STEP_DEFAULT = 4

Field Value

Type	Description
System.Int32

| Improve this Doc View Source

SHIFT_START_INT32

Integers are stored at lower precision by shifting off lower bits. The shift count is stored as SHIFT_START_INT32+shift in the first byte

NOTE: This was SHIFT_START_INT in Lucene

Declaration

public const byte SHIFT_START_INT32 = 96

Field Value

Type	Description
System.Byte

| Improve this Doc View Source

SHIFT_START_INT64

Longs are stored at lower precision by shifting off lower bits. The shift count is stored as SHIFT_START_INT64+shift in the first byte

NOTE: This was SHIFT_START_LONG in Lucene

Declaration

public const char SHIFT_START_INT64 = ' '

Field Value

Type	Description
System.Char

Methods

| Improve this Doc View Source

DoubleToSortableInt64(Double)

Converts a System.Double value to a sortable signed System.Int64. The value is converted by getting their IEEE 754 floating-point "double format" bit layout and then some bits are swapped, to be able to compare the result as System.Int64. By this the precision is not reduced, but the value can easily used as a System.Int64. The sort order (including System.Double.NaN) is defined by System.Double.CompareTo(System.Double); NaN is greater than positive infinity.

NOTE: This was doubleToSortableLong() in Lucene

Declaration

public static long DoubleToSortableInt64(double val)

Parameters

Type	Name	Description
System.Double	val

Returns

Type	Description
System.Int64

FilterPrefixCodedInt32s(TermsEnum)

Filters the given TermsEnum by accepting only prefix coded 32 bit terms with a shift value of 0.

NOTE: This was filterPrefixCodedInts() in Lucene

Declaration

public static TermsEnum FilterPrefixCodedInt32s(TermsEnum termsEnum)

Parameters

Type	Name	Description
TermsEnum	termsEnum	The terms enum to filter

Returns

Type	Description
TermsEnum	A filtered TermsEnum that only returns prefix coded 32 bit terms with a shift value of `0`.

| Improve this Doc View Source

FilterPrefixCodedInt64s(TermsEnum)

Filters the given TermsEnum by accepting only prefix coded 64 bit terms with a shift value of 0.

NOTE: This was filterPrefixCodedLongs() in Lucene

Declaration

public static TermsEnum FilterPrefixCodedInt64s(TermsEnum termsEnum)

Parameters

Type	Name	Description
TermsEnum	termsEnum	The terms enum to filter

Returns

Type	Description
TermsEnum	A filtered TermsEnum that only returns prefix coded 64 bit terms with a shift value of `0`.

| Improve this Doc View Source

GetPrefixCodedInt32Shift(BytesRef)

Returns the shift value from a prefix encoded System.Int32.

NOTE: This was getPrefixCodedIntShift() in Lucene

Declaration

public static int GetPrefixCodedInt32Shift(BytesRef val)

Parameters

Type	Name	Description
BytesRef	val

Returns

Type	Description
System.Int32

Exceptions

Type	Condition
System.FormatException	if the supplied BytesRef is not correctly prefix encoded.

| Improve this Doc View Source

GetPrefixCodedInt64Shift(BytesRef)

Returns the shift value from a prefix encoded System.Int64.

NOTE: This was getPrefixCodedLongShift() in Lucene

Declaration

public static int GetPrefixCodedInt64Shift(BytesRef val)

Parameters

Type	Name	Description
BytesRef	val

Returns

Type	Description
System.Int32

Exceptions

Type	Condition
System.FormatException	if the supplied BytesRef is not correctly prefix encoded.

| Improve this Doc View Source

Int32ToPrefixCoded(Int32, Int32, BytesRef)

Returns prefix coded bits after reducing the precision by shift bits. This is method is used by NumericTokenStream. After encoding, bytes.Offset will always be 0.

NOTE: This was intToPrefixCoded() in Lucene

Declaration

public static void Int32ToPrefixCoded(int val, int shift, BytesRef bytes)

Parameters

Type	Name	Description
System.Int32	val	The numeric value
System.Int32	shift	How many bits to strip from the right
BytesRef	bytes	Will contain the encoded value

| Improve this Doc View Source

Int32ToPrefixCodedBytes(Int32, Int32, BytesRef)

Returns prefix coded bits after reducing the precision by shift bits. This is method is used by NumericTokenStream. After encoding, bytes.Offset will always be 0.

NOTE: This was intToPrefixCodedBytes() in Lucene

Declaration

public static void Int32ToPrefixCodedBytes(int val, int shift, BytesRef bytes)

Parameters

Type	Name	Description
System.Int32	val	The numeric value
System.Int32	shift	How many bits to strip from the right
BytesRef	bytes	Will contain the encoded value

| Improve this Doc View Source

Int64ToPrefixCoded(Int64, Int32, BytesRef)

Returns prefix coded bits after reducing the precision by shift bits. This is method is used by NumericTokenStream. After encoding, bytes.Offset will always be 0.

NOTE: This was longToPrefixCoded() in Lucene

Declaration

public static void Int64ToPrefixCoded(long val, int shift, BytesRef bytes)

Parameters

Type	Name	Description
System.Int64	val	The numeric value
System.Int32	shift	How many bits to strip from the right
BytesRef	bytes	Will contain the encoded value

| Improve this Doc View Source

Int64ToPrefixCodedBytes(Int64, Int32, BytesRef)

Returns prefix coded bits after reducing the precision by shift bits. This is method is used by NumericTokenStream. After encoding, bytes.Offset will always be 0.

NOTE: This was longToPrefixCodedBytes() in Lucene

Declaration

public static void Int64ToPrefixCodedBytes(long val, int shift, BytesRef bytes)

Parameters

Type	Name	Description
System.Int64	val	The numeric value
System.Int32	shift	How many bits to strip from the right
BytesRef	bytes	Will contain the encoded value

| Improve this Doc View Source

PrefixCodedToInt32(BytesRef)

Returns an System.Int32 from prefixCoded bytes. Rightmost bits will be zero for lower precision codes. This method can be used to decode a term's value.

NOTE: This was prefixCodedToInt() in Lucene

Declaration

public static int PrefixCodedToInt32(BytesRef val)

Parameters

Type	Name	Description
BytesRef	val

Returns

Type	Description
System.Int32

Exceptions

Type	Condition
System.FormatException	if the supplied BytesRef is not correctly prefix encoded.

PrefixCodedToInt64(BytesRef)

Returns a System.Int64 from prefixCoded bytes. Rightmost bits will be zero for lower precision codes. This method can be used to decode a term's value.

NOTE: This was prefixCodedToLong() in Lucene

Declaration

public static long PrefixCodedToInt64(BytesRef val)

Parameters

Type	Name	Description
BytesRef	val

Returns

Type	Description
System.Int64

Exceptions

Type	Condition
System.FormatException	if the supplied BytesRef is not correctly prefix encoded.

SingleToSortableInt32(Single)

Converts a System.Single value to a sortable signed System.Int32. The value is converted by getting their IEEE 754 floating-point "float format" bit layout and then some bits are swapped, to be able to compare the result as System.Int32. By this the precision is not reduced, but the value can easily used as an System.Int32. The sort order (including System.Single.NaN) is defined by System.Single.CompareTo(System.Single); NaN is greater than positive infinity.

NOTE: This was floatToSortableInt() in Lucene

Declaration

public static int SingleToSortableInt32(float val)

Parameters

Type	Name	Description
System.Single	val

Returns

Type	Description
System.Int32

SortableInt32ToSingle(Int32)

Converts a sortable System.Int32 back to a System.Single.

NOTE: This was sortableIntToFloat() in Lucene

Declaration

public static float SortableInt32ToSingle(int val)

Parameters

Type	Name	Description
System.Int32	val

Returns

Type	Description
System.Single

SortableInt64ToDouble(Int64)

Converts a sortable System.Int64 back to a System.Double.

NOTE: This was sortableLongToDouble() in Lucene

Declaration

public static double SortableInt64ToDouble(long val)

Parameters

Type	Name	Description
System.Int64	val

Returns

Type	Description
System.Double

SplitInt32Range(NumericUtils.Int32RangeBuilder, Int32, Int32, Int32)

Splits an System.Int32 range recursively. You may implement a builder that adds clauses to a BooleanQuery for each call to its AddRange(BytesRef, BytesRef) method.

This method is used by NumericRangeQuery.

NOTE: This was splitIntRange() in Lucene

Declaration

public static void SplitInt32Range(NumericUtils.Int32RangeBuilder builder, int precisionStep, int minBound, int maxBound)

Parameters

Type	Name	Description
NumericUtils.Int32RangeBuilder	builder
System.Int32	precisionStep
System.Int32	minBound
System.Int32	maxBound

| Improve this Doc View Source

SplitInt64Range(NumericUtils.Int64RangeBuilder, Int32, Int64, Int64)

Splits a long range recursively. You may implement a builder that adds clauses to a BooleanQuery for each call to its AddRange(BytesRef, BytesRef) method.

This method is used by NumericRangeQuery.

NOTE: This was splitLongRange() in Lucene

Declaration

public static void SplitInt64Range(NumericUtils.Int64RangeBuilder builder, int precisionStep, long minBound, long maxBound)

Parameters

Type	Name	Description
NumericUtils.Int64RangeBuilder	builder
System.Int32	precisionStep
System.Int64	minBound
System.Int64	maxBound

Class NumericUtils

Note

Inheritance

Inherited Members

Namespace: Lucene.Net.Util

Assembly: Lucene.Net.dll

Syntax

Fields

BUF_SIZE_INT32

Declaration

Field Value

See Also

BUF_SIZE_INT64

Declaration

Field Value

See Also

PRECISION_STEP_DEFAULT

Declaration

Field Value

SHIFT_START_INT32

Declaration

Field Value

SHIFT_START_INT64

Declaration

Field Value

Methods

DoubleToSortableInt64(Double)

Declaration

Parameters

Returns

See Also

FilterPrefixCodedInt32s(TermsEnum)

Declaration

Parameters

Returns

FilterPrefixCodedInt64s(TermsEnum)

Declaration

Parameters

Returns

GetPrefixCodedInt32Shift(BytesRef)

Declaration

Parameters

Returns

Exceptions

GetPrefixCodedInt64Shift(BytesRef)

Declaration

Parameters

Returns

Exceptions

Int32ToPrefixCoded(Int32, Int32, BytesRef)

Declaration

Parameters

Int32ToPrefixCodedBytes(Int32, Int32, BytesRef)

Declaration

Parameters

Int64ToPrefixCoded(Int64, Int32, BytesRef)

Declaration

Parameters

Int64ToPrefixCodedBytes(Int64, Int32, BytesRef)

Declaration

Parameters

PrefixCodedToInt32(BytesRef)

Declaration

Parameters

Returns

Exceptions

See Also

PrefixCodedToInt64(BytesRef)

Declaration

Parameters

Returns

Exceptions

See Also

SingleToSortableInt32(Single)

Declaration

Parameters

Returns

See Also

SortableInt32ToSingle(Int32)

Declaration