Show / Hide Table of Contents

    Class BM25Similarity

    BM25 Similarity. Introduced in Stephen E. Robertson, Steve Walker, Susan Jones, Micheline Hancock-Beaulieu, and Mike Gatford. Okapi at TREC-3. In Proceedings of the Third Text REtrieval Conference (TREC 1994). Gaithersburg, USA, November 1994.

    This is a Lucene.NET EXPERIMENTAL API, use at your own risk
    Inheritance
    System.Object
    Similarity
    BM25Similarity
    Inherited Members
    Similarity.Coord(Int32, Int32)
    Similarity.QueryNorm(Single)
    Namespace: Lucene.Net.Search.Similarities
    Assembly: Lucene.Net.dll
    Syntax
    public class BM25Similarity : Similarity

    Constructors

    | Improve this Doc View Source

    BM25Similarity()

    BM25 with these default values:

    • k1 = 1.2,
    • b = 0.75.

    Declaration
    public BM25Similarity()
    | Improve this Doc View Source

    BM25Similarity(Single, Single)

    BM25 with the supplied parameter values.

    Declaration
    public BM25Similarity(float k1, float b)
    Parameters
    Type Name Description
    System.Single k1

    Controls non-linear term frequency normalization (saturation).

    System.Single b

    Controls to what degree document length normalizes tf values.

    Properties

    | Improve this Doc View Source

    B

    Returns the b parameter

    Declaration
    public virtual float B { get; }
    Property Value
    Type Description
    System.Single
    See Also
    BM25Similarity(Single, Single)
    | Improve this Doc View Source

    DiscountOverlaps

    Gets or Sets whether overlap tokens (Tokens with 0 position increment) are ignored when computing norm. By default this is true, meaning overlap tokens do not count when computing norms.

    Declaration
    public virtual bool DiscountOverlaps { get; set; }
    Property Value
    Type Description
    System.Boolean
    | Improve this Doc View Source

    K1

    Returns the k1 parameter

    Declaration
    public virtual float K1 { get; }
    Property Value
    Type Description
    System.Single
    See Also
    BM25Similarity(Single, Single)

    Methods

    | Improve this Doc View Source

    AvgFieldLength(CollectionStatistics)

    The default implementation computes the average as sumTotalTermFreq / maxDoc, or returns 1 if the index does not store sumTotalTermFreq (Lucene 3.x indexes or any field that omits frequency information).

    Declaration
    protected virtual float AvgFieldLength(CollectionStatistics collectionStats)
    Parameters
    Type Name Description
    CollectionStatistics collectionStats
    Returns
    Type Description
    System.Single
    | Improve this Doc View Source

    ComputeNorm(FieldInvertState)

    Declaration
    public override sealed long ComputeNorm(FieldInvertState state)
    Parameters
    Type Name Description
    FieldInvertState state
    Returns
    Type Description
    System.Int64
    Overrides
    Similarity.ComputeNorm(FieldInvertState)
    | Improve this Doc View Source

    ComputeWeight(Single, CollectionStatistics, TermStatistics[])

    Declaration
    public override sealed Similarity.SimWeight ComputeWeight(float queryBoost, CollectionStatistics collectionStats, params TermStatistics[] termStats)
    Parameters
    Type Name Description
    System.Single queryBoost
    CollectionStatistics collectionStats
    TermStatistics[] termStats
    Returns
    Type Description
    Similarity.SimWeight
    Overrides
    Similarity.ComputeWeight(Single, CollectionStatistics, TermStatistics[])
    | Improve this Doc View Source

    DecodeNormValue(Byte)

    The default implementation returns 1 / f2 where f is Byte315ToSingle(Byte).

    Declaration
    protected virtual float DecodeNormValue(byte b)
    Parameters
    Type Name Description
    System.Byte b
    Returns
    Type Description
    System.Single
    | Improve this Doc View Source

    EncodeNormValue(Single, Int32)

    The default implementation encodes boost / sqrt(length) with SingleToByte315(Single). This is compatible with Lucene's default implementation. If you change this, then you should change DecodeNormValue(Byte) to match.

    Declaration
    protected virtual byte EncodeNormValue(float boost, int fieldLength)
    Parameters
    Type Name Description
    System.Single boost
    System.Int32 fieldLength
    Returns
    Type Description
    System.Byte
    | Improve this Doc View Source

    GetSimScorer(Similarity.SimWeight, AtomicReaderContext)

    Declaration
    public override sealed Similarity.SimScorer GetSimScorer(Similarity.SimWeight stats, AtomicReaderContext context)
    Parameters
    Type Name Description
    Similarity.SimWeight stats
    AtomicReaderContext context
    Returns
    Type Description
    Similarity.SimScorer
    Overrides
    Similarity.GetSimScorer(Similarity.SimWeight, AtomicReaderContext)
    | Improve this Doc View Source

    Idf(Int64, Int64)

    Implemented as log(1 + (numDocs - docFreq + 0.5)/(docFreq + 0.5)).

    Declaration
    protected virtual float Idf(long docFreq, long numDocs)
    Parameters
    Type Name Description
    System.Int64 docFreq
    System.Int64 numDocs
    Returns
    Type Description
    System.Single
    | Improve this Doc View Source

    IdfExplain(CollectionStatistics, TermStatistics)

    Computes a score factor for a simple term and returns an explanation for that score factor.

    The default implementation uses:

        Idf(docFreq, searcher.MaxDoc);

    Note that MaxDoc is used instead of NumDocs because also DocFreq is used, and when the latter is inaccurate, so is MaxDoc, and in the same direction. In addition, MaxDoc is more efficient to compute

    Declaration
    public virtual Explanation IdfExplain(CollectionStatistics collectionStats, TermStatistics termStats)
    Parameters
    Type Name Description
    CollectionStatistics collectionStats

    collection-level statistics

    TermStatistics termStats

    term-level statistics for the term

    Returns
    Type Description
    Explanation

    an Explanation object that includes both an idf score factor and an explanation for the term.

    | Improve this Doc View Source

    IdfExplain(CollectionStatistics, TermStatistics[])

    Computes a score factor for a phrase.

    The default implementation sums the idf factor for each term in the phrase.

    Declaration
    public virtual Explanation IdfExplain(CollectionStatistics collectionStats, TermStatistics[] termStats)
    Parameters
    Type Name Description
    CollectionStatistics collectionStats

    collection-level statistics

    TermStatistics[] termStats

    term-level statistics for the terms in the phrase

    Returns
    Type Description
    Explanation

    an Explanation object that includes both an idf score factor for the phrase and an explanation for each term.

    | Improve this Doc View Source

    ScorePayload(Int32, Int32, Int32, BytesRef)

    The default implementation returns 1

    Declaration
    protected virtual float ScorePayload(int doc, int start, int end, BytesRef payload)
    Parameters
    Type Name Description
    System.Int32 doc
    System.Int32 start
    System.Int32 end
    BytesRef payload
    Returns
    Type Description
    System.Single
    | Improve this Doc View Source

    SloppyFreq(Int32)

    Implemented as 1 / (distance + 1).

    Declaration
    protected virtual float SloppyFreq(int distance)
    Parameters
    Type Name Description
    System.Int32 distance
    Returns
    Type Description
    System.Single
    | Improve this Doc View Source

    ToString()

    Declaration
    public override string ToString()
    Returns
    Type Description
    System.String
    • Improve this Doc
    • View Source
    Back to top Copyright © 2020 Licensed to the Apache Software Foundation (ASF)