Fork me on GitHub
  • API

    Show / Hide Table of Contents

    Class DFRSimilarity

    Implements the divergence from randomness (DFR) framework introduced in Gianni Amati and Cornelis Joost Van Rijsbergen. 2002. Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Trans. Inf. Syst. 20, 4 (October 2002), 357-389.

    The DFR scoring formula is composed of three separate components: the basic model, the aftereffect and an additional normalization component, represented by the classes BasicModel, AfterEffect and Normalization, respectively. The names of these classes were chosen to match the names of their counterparts in the Terrier IR engine.

    To construct a DFRSimilarity, you must specify the implementations for all three components of DFR:

    ComponentImplementations
    BasicModel: Basic model of information content:
    • BasicModelBE: Limiting form of Bose-Einstein
    • BasicModelG: Geometric approximation of Bose-Einstein
    • BasicModelP: Poisson approximation of the Binomial
    • BasicModelD: Divergence approximation of the Binomial
    • BasicModelIn: Inverse document frequency
    • BasicModelIne: Inverse expected document frequency [mixture of Poisson and IDF]
    • BasicModelIF: Inverse term frequency [approximation of I(ne)]
    AfterEffect: First normalization of information gain:
    • AfterEffectL: Laplace's law of succession
    • AfterEffectB: Ratio of two Bernoulli processes
    • AfterEffect.NoAfterEffect: no first normalization
    Normalization: Second (length) normalization:
    • NormalizationH1: Uniform distribution of term frequency
    • NormalizationH2: term frequency density inversely related to length
    • NormalizationH3: term frequency normalization provided by Dirichlet prior
    • NormalizationZ: term frequency normalization provided by a Zipfian relation
    • Normalization.NoNormalization: no second normalization

    Note that qtf, the multiplicity of term-occurrence in the query, is not handled by this implementation.

    Note

    This API is experimental and might change in incompatible ways in the next release.

    Inheritance
    object
    Similarity
    SimilarityBase
    DFRSimilarity
    Inherited Members
    SimilarityBase.DiscountOverlaps
    SimilarityBase.ComputeWeight(float, CollectionStatistics, params TermStatistics[])
    SimilarityBase.NewStats(string, float)
    SimilarityBase.FillBasicStats(BasicStats, CollectionStatistics, TermStatistics)
    SimilarityBase.Explain(BasicStats, int, Explanation, float)
    SimilarityBase.GetSimScorer(Similarity.SimWeight, AtomicReaderContext)
    SimilarityBase.ComputeNorm(FieldInvertState)
    SimilarityBase.DecodeNormValue(byte)
    SimilarityBase.EncodeNormValue(float, float)
    SimilarityBase.Log2(double)
    Similarity.Coord(int, int)
    Similarity.QueryNorm(float)
    object.Equals(object)
    object.Equals(object, object)
    object.GetHashCode()
    object.GetType()
    object.MemberwiseClone()
    object.ReferenceEquals(object, object)
    Namespace: Lucene.Net.Search.Similarities
    Assembly: Lucene.Net.dll
    Syntax
    public class DFRSimilarity : SimilarityBase

    Constructors

    DFRSimilarity(BasicModel, AfterEffect, Normalization)

    Creates DFRSimilarity from the three components.

    Note that null values are not allowed: if you want no normalization or after-effect, instead pass Normalization.NoNormalization or AfterEffect.NoAfterEffect respectively.
    Declaration
    public DFRSimilarity(BasicModel basicModel, AfterEffect afterEffect, Normalization normalization)
    Parameters
    Type Name Description
    BasicModel basicModel

    Basic model of information content

    AfterEffect afterEffect

    First normalization of information gain

    Normalization normalization

    Second (length) normalization

    Exceptions
    Type Condition
    ArgumentNullException

    basicModel, afterEffect, or normalization is null.

    See Also
    BasicModel
    AfterEffect
    Normalization

    Fields

    m_afterEffect

    The first normalization of the information content.

    Declaration
    protected readonly AfterEffect m_afterEffect
    Field Value
    Type Description
    AfterEffect
    See Also
    BasicModel
    AfterEffect
    Normalization

    m_basicModel

    The basic model for information content.

    Declaration
    protected readonly BasicModel m_basicModel
    Field Value
    Type Description
    BasicModel
    See Also
    BasicModel
    AfterEffect
    Normalization

    m_normalization

    The term frequency normalization.

    Declaration
    protected readonly Normalization m_normalization
    Field Value
    Type Description
    Normalization
    See Also
    BasicModel
    AfterEffect
    Normalization

    Properties

    AfterEffect

    Returns the first normalization

    Declaration
    public virtual AfterEffect AfterEffect { get; }
    Property Value
    Type Description
    AfterEffect
    See Also
    BasicModel
    AfterEffect
    Normalization

    BasicModel

    Returns the basic model of information content

    Declaration
    public virtual BasicModel BasicModel { get; }
    Property Value
    Type Description
    BasicModel
    See Also
    BasicModel
    AfterEffect
    Normalization

    Normalization

    Returns the second normalization

    Declaration
    public virtual Normalization Normalization { get; }
    Property Value
    Type Description
    Normalization
    See Also
    BasicModel
    AfterEffect
    Normalization

    Methods

    Explain(Explanation, BasicStats, int, float, float)

    Subclasses should implement this method to explain the score. expl already contains the score, the name of the class and the doc id, as well as the term frequency and its explanation; subclasses can add additional clauses to explain details of their scoring formulae.

    The default implementation does nothing.

    Declaration
    protected override void Explain(Explanation expl, BasicStats stats, int doc, float freq, float docLen)
    Parameters
    Type Name Description
    Explanation expl

    the explanation to extend with details.

    BasicStats stats

    the corpus level statistics.

    int doc

    the document id.

    float freq

    the term frequency.

    float docLen

    the document length.

    Overrides
    SimilarityBase.Explain(Explanation, BasicStats, int, float, float)
    See Also
    BasicModel
    AfterEffect
    Normalization

    Score(BasicStats, float, float)

    Scores the document doc.

    Subclasses must apply their scoring formula in this class.

    Declaration
    public override float Score(BasicStats stats, float freq, float docLen)
    Parameters
    Type Name Description
    BasicStats stats

    the corpus level statistics.

    float freq

    the term frequency.

    float docLen

    the document length.

    Returns
    Type Description
    float

    the score.

    Overrides
    SimilarityBase.Score(BasicStats, float, float)
    See Also
    BasicModel
    AfterEffect
    Normalization

    ToString()

    Subclasses must override this method to return the name of the Similarity and preferably the values of parameters (if any) as well.

    Declaration
    public override string ToString()
    Returns
    Type Description
    string
    Overrides
    SimilarityBase.ToString()
    See Also
    BasicModel
    AfterEffect
    Normalization

    See Also

    BasicModel
    AfterEffect
    Normalization
    Back to top Copyright © 2024 The Apache Software Foundation, Licensed under the Apache License, Version 2.0
    Apache Lucene.Net, Lucene.Net, Apache, the Apache feather logo, and the Apache Lucene.Net project logo are trademarks of The Apache Software Foundation.
    All other marks mentioned may be trademarks or registered trademarks of their respective owners.