• API

    Show / Hide Table of Contents

    Class NumericRangeQuery<T>

    A Query that matches numeric values within a specified range. To use this, you must first index the numeric values using Int32Field, SingleField, Int64Field or DoubleField (expert: NumericTokenStream). If your terms are instead textual, you should use TermRangeQuery.
    NumericRangeFilter is the filter equivalent of this query.

    You create a new NumericRangeQuery<T> with the static factory methods, eg:

    Query q = NumericRangeQuery.NewFloatRange("weight", 0.03f, 0.10f, true, true);
    matches all documents whose System.Single valued "weight" field ranges from 0.03 to 0.10, inclusive.

    The performance of NumericRangeQuery<T> is much better than the corresponding TermRangeQuery because the number of terms that must be searched is usually far fewer, thanks to trie indexing, described below.

    You can optionally specify a Lucene.Net.Search.NumericRangeQuery`1.precisionStep when creating this query. This is necessary if you've changed this configuration from its default (4) during indexing. Lower values consume more disk space but speed up searching. Suitable values are between 1 and 8. A good starting point to test is 4, which is the default value for all Numeric* classes. See below for details.

    This query defaults to CONSTANT_SCORE_AUTO_REWRITE_DEFAULT. With precision steps of <=4, this query can be run with one of the BooleanQuery rewrite methods without changing BooleanQuery's default max clause count.

    How it works

    See the publication about panFMP, where this algorithm was described (referred to as TrieRangeQuery):

    Schindler, U, Diepenbroek, M, 2008. Generic XML-based Framework for Metadata Portals. Computers & Geosciences 34 (12), 1947-1955. doi:10.1016/j.cageo.2008.02.023

    A quote from this paper: Because Apache Lucene is a full-text search engine and not a conventional database, it cannot handle numerical ranges (e.g., field value is inside user defined bounds, even dates are numerical values). We have developed an extension to Apache Lucene that stores the numerical values in a special string-encoded format with variable precision (all numerical values like System.Doubles, System.Int64s, System.Singles, and System.Int32s are converted to lexicographic sortable string representations and stored with different precisions (for a more detailed description of how the values are stored, see NumericUtils). A range is then divided recursively into multiple intervals for searching: The center of the range is searched only with the lowest possible precision in the trie, while the boundaries are matched more exactly. This reduces the number of terms dramatically.

    For the variant that stores long values in 8 different precisions (each reduced by 8 bits) that uses a lowest precision of 1 byte, the index contains only a maximum of 256 distinct values in the lowest precision. Overall, a range could consist of a theoretical maximum of

    7*255*2 + 255 = 3825
    distinct terms (when there is a term for every distinct value of an 8-byte-number in the index and the range covers almost all of them; a maximum of 255 distinct values is used because it would always be possible to reduce the full 256 values to one term with degraded precision). In practice, we have seen up to 300 terms in most cases (index with 500,000 metadata records and a uniform value distribution).

    Precision Step

    You can choose any Lucene.Net.Search.NumericRangeQuery`1.precisionStep when encoding values. Lower step values mean more precisions and so more terms in index (and index gets larger). The number of indexed terms per value is (those are generated by NumericTokenStream):

    indexedTermsPerValue = ceil(bitsPerValue / precisionStep)

    As the lower precision terms are shared by many values, the additional terms only slightly grow the term dictionary (approx. 7% for precisionStep=4), but have a larger impact on the postings (the postings file will have more entries, as every document is linked to indexedTermsPerValue terms instead of one). The formula to estimate the growth of the term dictionary in comparison to one term per value:

    \mathrm{termDictOverhead} = \sum\limits_{i=0}^{\mathrm{indexedTermsPerValue}-1} \frac{1}{2^{\mathrm{precisionStep}\cdot i}}

    On the other hand, if the Lucene.Net.Search.NumericRangeQuery`1.precisionStep is smaller, the maximum number of terms to match reduces, which optimizes query speed. The formula to calculate the maximum number of terms that will be visited while executing the query is:

    \mathrm{maxQueryTerms} = \left[ \left( \mathrm{indexedTermsPerValue} - 1 \right) \cdot \left(2^\mathrm{precisionStep} - 1 \right) \cdot 2 \right] + \left( 2^\mathrm{precisionStep} - 1 \right)

    For longs stored using a precision step of 4, maxQueryTerms = 15152 + 15 = 465, and for a precision step of 2, maxQueryTerms = 3132 + 3 = 189. But the faster search speed is reduced by more seeking in the term enum of the index. Because of this, the ideal Lucene.Net.Search.NumericRangeQuery`1.precisionStep value can only be found out by testing. Important: You can index with a lower precision step value and test search speed using a multiple of the original step value.

    Good values for Lucene.Net.Search.NumericRangeQuery`1.precisionStep are depending on usage and data type:

    • The default for all data types is 4, which is used, when no
      precisionStep
      is given.
    • Ideal value in most cases for 64 bit data types (long, double) is 6 or 8.
    • Ideal value in most cases for 32 bit data types (int, float) is 4.
    • For low cardinality fields larger precision steps are good. If the cardinality is < 100, it is fair to use System.Int32.MaxValue (see below).
    • Steps >=64 for long/double and >=32 for int/float produces one token per value in the index and querying is as slow as a conventional TermRangeQuery. But it can be used to produce fields, that are solely used for sorting (in this case simply use System.Int32.MaxValue as Lucene.Net.Search.NumericRangeQuery`1.precisionStep). Using Int32Field, Int64Field, SingleField or DoubleField for sorting is ideal, because building the field cache is much faster than with text-only numbers. These fields have one term per value and therefore also work with term enumeration for building distinct lists (e.g. facets / preselected values to search for). Sorting is also possible with range query optimized fields using one of the above Lucene.Net.Search.NumericRangeQuery`1.precisionSteps.

    Comparisons of the different types of RangeQueries on an index with about 500,000 docs showed that TermRangeQuery in boolean rewrite mode (with raised BooleanQuery clause count) took about 30-40 secs to complete, TermRangeQuery in constant score filter rewrite mode took 5 secs and executing this class took <100ms to complete (on an Opteron64 machine, Java 1.5, 8 bit precision step). This query type was developed for a geographic portal, where the performance for e.g. bounding boxes or exact date/time stamps is important.

    @since 2.9

    Inheritance
    System.Object
    Query
    MultiTermQuery
    NumericRangeQuery<T>
    Inherited Members
    MultiTermQuery.m_field
    MultiTermQuery.m_rewriteMethod
    MultiTermQuery.CONSTANT_SCORE_FILTER_REWRITE
    MultiTermQuery.SCORING_BOOLEAN_QUERY_REWRITE
    MultiTermQuery.CONSTANT_SCORE_BOOLEAN_QUERY_REWRITE
    MultiTermQuery.CONSTANT_SCORE_AUTO_REWRITE_DEFAULT
    MultiTermQuery.Field
    MultiTermQuery.GetTermsEnum(Terms)
    MultiTermQuery.Rewrite(IndexReader)
    MultiTermQuery.MultiTermRewriteMethod
    Query.Boost
    Query.ToString()
    Query.CreateWeight(IndexSearcher)
    Query.ExtractTerms(ISet<Term>)
    Query.Clone()
    System.Object.Equals(System.Object, System.Object)
    System.Object.GetType()
    System.Object.MemberwiseClone()
    System.Object.ReferenceEquals(System.Object, System.Object)
    Namespace: Lucene.Net.Search
    Assembly: Lucene.Net.dll
    Syntax
    public sealed class NumericRangeQuery<T> : MultiTermQuery where T : struct, IComparable<T>
    Type Parameters
    Name Description
    T

    Properties

    | Improve this Doc View Source

    IncludesMax

    Returns true if the upper endpoint is inclusive

    Declaration
    public bool IncludesMax { get; }
    Property Value
    Type Description
    System.Boolean
    | Improve this Doc View Source

    IncludesMin

    Returns true if the lower endpoint is inclusive

    Declaration
    public bool IncludesMin { get; }
    Property Value
    Type Description
    System.Boolean
    | Improve this Doc View Source

    Max

    Returns the upper value of this range query

    Declaration
    public T? Max { get; }
    Property Value
    Type Description
    System.Nullable<T>
    | Improve this Doc View Source

    Min

    Returns the lower value of this range query

    Declaration
    public T? Min { get; }
    Property Value
    Type Description
    System.Nullable<T>
    | Improve this Doc View Source

    PrecisionStep

    Returns the precision step.

    Declaration
    public int PrecisionStep { get; }
    Property Value
    Type Description
    System.Int32

    Methods

    | Improve this Doc View Source

    Equals(Object)

    Declaration
    public override bool Equals(object o)
    Parameters
    Type Name Description
    System.Object o
    Returns
    Type Description
    System.Boolean
    Overrides
    MultiTermQuery.Equals(Object)
    | Improve this Doc View Source

    GetHashCode()

    Declaration
    public override int GetHashCode()
    Returns
    Type Description
    System.Int32
    Overrides
    MultiTermQuery.GetHashCode()
    | Improve this Doc View Source

    GetTermsEnum(Terms, AttributeSource)

    Declaration
    protected override TermsEnum GetTermsEnum(Terms terms, AttributeSource atts)
    Parameters
    Type Name Description
    Terms terms
    AttributeSource atts
    Returns
    Type Description
    TermsEnum
    Overrides
    MultiTermQuery.GetTermsEnum(Terms, AttributeSource)
    | Improve this Doc View Source

    ToString(String)

    Declaration
    public override string ToString(string field)
    Parameters
    Type Name Description
    System.String field
    Returns
    Type Description
    System.String
    Overrides
    Query.ToString(String)
    • Improve this Doc
    • View Source
    Back to top Copyright © 2020 Licensed to the Apache Software Foundation (ASF)