Fork me on GitHub
  • API

    Show / Hide Table of Contents

    Class BM25Similarity

    BM25 Similarity. Introduced in Stephen E. Robertson, Steve Walker, Susan Jones, Micheline Hancock-Beaulieu, and Mike Gatford. Okapi at TREC-3. In Proceedings of the Third Text REtrieval Conference (TREC 1994). Gaithersburg, USA, November 1994.

    Note

    This API is experimental and might change in incompatible ways in the next release.

    Inheritance
    object
    Similarity
    BM25Similarity
    Inherited Members
    Similarity.Coord(int, int)
    Similarity.QueryNorm(float)
    object.Equals(object)
    object.Equals(object, object)
    object.GetHashCode()
    object.GetType()
    object.MemberwiseClone()
    object.ReferenceEquals(object, object)
    Namespace: Lucene.Net.Search.Similarities
    Assembly: Lucene.Net.dll
    Syntax
    public class BM25Similarity : Similarity

    Constructors

    BM25Similarity()

    BM25 with these default values:

    • k1 = 1.2,
    • b = 0.75.
    Declaration
    public BM25Similarity()

    BM25Similarity(float, float)

    BM25 with the supplied parameter values.

    Declaration
    public BM25Similarity(float k1, float b)
    Parameters
    Type Name Description
    float k1

    Controls non-linear term frequency normalization (saturation).

    float b

    Controls to what degree document length normalizes tf values.

    Properties

    B

    Returns the b parameter

    Declaration
    public virtual float B { get; }
    Property Value
    Type Description
    float
    See Also
    BM25Similarity(float, float)

    DiscountOverlaps

    Gets or Sets whether overlap tokens (Tokens with 0 position increment) are ignored when computing norm. By default this is true, meaning overlap tokens do not count when computing norms.

    Declaration
    public virtual bool DiscountOverlaps { get; set; }
    Property Value
    Type Description
    bool

    K1

    Returns the k1 parameter

    Declaration
    public virtual float K1 { get; }
    Property Value
    Type Description
    float
    See Also
    BM25Similarity(float, float)

    Methods

    AvgFieldLength(CollectionStatistics)

    The default implementation computes the average as sumTotalTermFreq / maxDoc, or returns 1 if the index does not store sumTotalTermFreq (Lucene 3.x indexes or any field that omits frequency information).

    Declaration
    protected virtual float AvgFieldLength(CollectionStatistics collectionStats)
    Parameters
    Type Name Description
    CollectionStatistics collectionStats
    Returns
    Type Description
    float

    ComputeNorm(FieldInvertState)

    Computes the normalization value for a field, given the accumulated state of term processing for this field (see FieldInvertState).

    Matches in longer fields are less precise, so implementations of this method usually set smaller values when state.Length is large, and larger values when
    state.Length
    is small.

    Note

    This API is experimental and might change in incompatible ways in the next release.

    Declaration
    public override sealed long ComputeNorm(FieldInvertState state)
    Parameters
    Type Name Description
    FieldInvertState state

    current processing state for this field

    Returns
    Type Description
    long

    computed norm value

    Overrides
    Similarity.ComputeNorm(FieldInvertState)

    ComputeWeight(float, CollectionStatistics, params TermStatistics[])

    Compute any collection-level weight (e.g. IDF, average document length, etc) needed for scoring a query.

    Declaration
    public override sealed Similarity.SimWeight ComputeWeight(float queryBoost, CollectionStatistics collectionStats, params TermStatistics[] termStats)
    Parameters
    Type Name Description
    float queryBoost

    the query-time boost.

    CollectionStatistics collectionStats

    collection-level statistics, such as the number of tokens in the collection.

    TermStatistics[] termStats

    term-level statistics, such as the document frequency of a term across the collection.

    Returns
    Type Description
    Similarity.SimWeight

    Similarity.SimWeight object with the information this Similarity needs to score a query.

    Overrides
    Similarity.ComputeWeight(float, CollectionStatistics, params TermStatistics[])

    DecodeNormValue(byte)

    The default implementation returns 1 / f2 where f is Byte315ToSingle(byte).

    Declaration
    protected virtual float DecodeNormValue(byte b)
    Parameters
    Type Name Description
    byte b
    Returns
    Type Description
    float

    EncodeNormValue(float, int)

    The default implementation encodes boost / sqrt(length) with SingleToByte315(float). This is compatible with Lucene's default implementation. If you change this, then you should change DecodeNormValue(byte) to match.

    Declaration
    protected virtual byte EncodeNormValue(float boost, int fieldLength)
    Parameters
    Type Name Description
    float boost
    int fieldLength
    Returns
    Type Description
    byte

    GetSimScorer(SimWeight, AtomicReaderContext)

    Creates a new Similarity.SimScorer to score matching documents from a segment of the inverted index.

    Declaration
    public override sealed Similarity.SimScorer GetSimScorer(Similarity.SimWeight stats, AtomicReaderContext context)
    Parameters
    Type Name Description
    Similarity.SimWeight stats
    AtomicReaderContext context

    segment of the inverted index to be scored.

    Returns
    Type Description
    Similarity.SimScorer

    Sloppy Similarity.SimScorer for scoring documents across context

    Overrides
    Similarity.GetSimScorer(Similarity.SimWeight, AtomicReaderContext)
    Exceptions
    Type Condition
    IOException

    if there is a low-level I/O error

    Idf(long, long)

    Implemented as log(1 + (numDocs - docFreq + 0.5)/(docFreq + 0.5)).

    Declaration
    protected virtual float Idf(long docFreq, long numDocs)
    Parameters
    Type Name Description
    long docFreq
    long numDocs
    Returns
    Type Description
    float

    IdfExplain(CollectionStatistics, TermStatistics)

    Computes a score factor for a simple term and returns an explanation for that score factor.

    The default implementation uses:
    Idf(docFreq, searcher.MaxDoc);

    Note that MaxDoc is used instead of NumDocs because also DocFreq is used, and when the latter is inaccurate, so is MaxDoc, and in the same direction. In addition, MaxDoc is more efficient to compute

    Declaration
    public virtual Explanation IdfExplain(CollectionStatistics collectionStats, TermStatistics termStats)
    Parameters
    Type Name Description
    CollectionStatistics collectionStats

    collection-level statistics

    TermStatistics termStats

    term-level statistics for the term

    Returns
    Type Description
    Explanation

    an Explanation object that includes both an idf score factor and an explanation for the term.

    IdfExplain(CollectionStatistics, TermStatistics[])

    Computes a score factor for a phrase.

    The default implementation sums the idf factor for each term in the phrase.
    Declaration
    public virtual Explanation IdfExplain(CollectionStatistics collectionStats, TermStatistics[] termStats)
    Parameters
    Type Name Description
    CollectionStatistics collectionStats

    collection-level statistics

    TermStatistics[] termStats

    term-level statistics for the terms in the phrase

    Returns
    Type Description
    Explanation

    an Explanation object that includes both an idf score factor for the phrase and an explanation for each term.

    ScorePayload(int, int, int, BytesRef)

    The default implementation returns 1

    Declaration
    protected virtual float ScorePayload(int doc, int start, int end, BytesRef payload)
    Parameters
    Type Name Description
    int doc
    int start
    int end
    BytesRef payload
    Returns
    Type Description
    float

    SloppyFreq(int)

    Implemented as 1 / (distance + 1).

    Declaration
    protected virtual float SloppyFreq(int distance)
    Parameters
    Type Name Description
    int distance
    Returns
    Type Description
    float

    ToString()

    Returns a string that represents the current object.

    Declaration
    public override string ToString()
    Returns
    Type Description
    string

    A string that represents the current object.

    Overrides
    object.ToString()
    Back to top Copyright © 2024 The Apache Software Foundation, Licensed under the Apache License, Version 2.0
    Apache Lucene.Net, Lucene.Net, Apache, the Apache feather logo, and the Apache Lucene.Net project logo are trademarks of The Apache Software Foundation.
    All other marks mentioned may be trademarks or registered trademarks of their respective owners.