Fork me on GitHub
  • API

    Show / Hide Table of Contents

    Class LMJelinekMercerSimilarity

    Language model based on the Jelinek-Mercer smoothing method. From Chengxiang Zhai and John Lafferty. 2001. A study of smoothing methods for language models applied to Ad Hoc information retrieval. In Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR '01). ACM, New York, NY, USA, 334-342.

    The model has a single parameter, λ. According to said paper, the optimal value depends on both the collection and the query. The optimal value is around 0.1 for title queries and 0.7 for long queries.

    Note

    This API is experimental and might change in incompatible ways in the next release.

    Inheritance
    object
    Similarity
    SimilarityBase
    LMSimilarity
    LMJelinekMercerSimilarity
    Inherited Members
    LMSimilarity.m_collectionModel
    LMSimilarity.NewStats(string, float)
    LMSimilarity.FillBasicStats(BasicStats, CollectionStatistics, TermStatistics)
    LMSimilarity.ToString()
    SimilarityBase.DiscountOverlaps
    SimilarityBase.ComputeWeight(float, CollectionStatistics, params TermStatistics[])
    SimilarityBase.Explain(BasicStats, int, Explanation, float)
    SimilarityBase.GetSimScorer(Similarity.SimWeight, AtomicReaderContext)
    SimilarityBase.ComputeNorm(FieldInvertState)
    SimilarityBase.DecodeNormValue(byte)
    SimilarityBase.EncodeNormValue(float, float)
    SimilarityBase.Log2(double)
    Similarity.Coord(int, int)
    Similarity.QueryNorm(float)
    object.Equals(object)
    object.Equals(object, object)
    object.GetHashCode()
    object.GetType()
    object.MemberwiseClone()
    object.ReferenceEquals(object, object)
    Namespace: Lucene.Net.Search.Similarities
    Assembly: Lucene.Net.dll
    Syntax
    public class LMJelinekMercerSimilarity : LMSimilarity

    Constructors

    LMJelinekMercerSimilarity(ICollectionModel, float)

    Instantiates with the specified collectionModel and λ parameter.

    Declaration
    public LMJelinekMercerSimilarity(LMSimilarity.ICollectionModel collectionModel, float lambda)
    Parameters
    Type Name Description
    LMSimilarity.ICollectionModel collectionModel
    float lambda

    LMJelinekMercerSimilarity(float)

    Instantiates with the specified λ parameter.

    Declaration
    public LMJelinekMercerSimilarity(float lambda)
    Parameters
    Type Name Description
    float lambda

    Properties

    Lambda

    Returns the λ parameter.

    Declaration
    public virtual float Lambda { get; }
    Property Value
    Type Description
    float

    Methods

    Explain(Explanation, BasicStats, int, float, float)

    Subclasses should implement this method to explain the score. expl already contains the score, the name of the class and the doc id, as well as the term frequency and its explanation; subclasses can add additional clauses to explain details of their scoring formulae.

    The default implementation does nothing.

    Declaration
    protected override void Explain(Explanation expl, BasicStats stats, int doc, float freq, float docLen)
    Parameters
    Type Name Description
    Explanation expl

    the explanation to extend with details.

    BasicStats stats

    the corpus level statistics.

    int doc

    the document id.

    float freq

    the term frequency.

    float docLen

    the document length.

    Overrides
    LMSimilarity.Explain(Explanation, BasicStats, int, float, float)

    GetName()

    Returns the name of the LM method. The values of the parameters should be included as well.

    Used in ToString()

    .
    Declaration
    public override string GetName()
    Returns
    Type Description
    string
    Overrides
    LMSimilarity.GetName()

    Score(BasicStats, float, float)

    Scores the document doc.

    Subclasses must apply their scoring formula in this class.

    Declaration
    public override float Score(BasicStats stats, float freq, float docLen)
    Parameters
    Type Name Description
    BasicStats stats

    the corpus level statistics.

    float freq

    the term frequency.

    float docLen

    the document length.

    Returns
    Type Description
    float

    the score.

    Overrides
    SimilarityBase.Score(BasicStats, float, float)
    Back to top Copyright © 2024 The Apache Software Foundation, Licensed under the Apache License, Version 2.0
    Apache Lucene.Net, Lucene.Net, Apache, the Apache feather logo, and the Apache Lucene.Net project logo are trademarks of The Apache Software Foundation.
    All other marks mentioned may be trademarks or registered trademarks of their respective owners.