Class LMDirichletSimilarity
Bayesian smoothing using Dirichlet priors. From Chengxiang Zhai and John Lafferty. 2001. A study of smoothing methods for language models applied to Ad Hoc information retrieval. In Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR '01). ACM, New York, NY, USA, 334-342.
The formula as defined the paper assigns a negative score to documents that
contain the term, but with fewer occurrences than predicted by the collection
language model. The Lucene implementation returns 0
for such
documents.
Note
This API is experimental and might change in incompatible ways in the next release.
Inherited Members
Namespace: Lucene.Net.Search.Similarities
Assembly: Lucene.Net.dll
Syntax
public class LMDirichletSimilarity : LMSimilarity
Constructors
LMDirichletSimilarity()
Instantiates the similarity with the default μ value of 2000.
Declaration
public LMDirichletSimilarity()
LMDirichletSimilarity(ICollectionModel)
Instantiates the similarity with the default μ value of 2000.
Declaration
public LMDirichletSimilarity(LMSimilarity.ICollectionModel collectionModel)
Parameters
Type | Name | Description |
---|---|---|
LMSimilarity.ICollectionModel | collectionModel |
LMDirichletSimilarity(ICollectionModel, float)
Instantiates the similarity with the provided μ parameter.
Declaration
public LMDirichletSimilarity(LMSimilarity.ICollectionModel collectionModel, float mu)
Parameters
Type | Name | Description |
---|---|---|
LMSimilarity.ICollectionModel | collectionModel | |
float | mu |
LMDirichletSimilarity(float)
Instantiates the similarity with the provided μ parameter.
Declaration
public LMDirichletSimilarity(float mu)
Parameters
Type | Name | Description |
---|---|---|
float | mu |
Properties
Mu
Returns the μ parameter.
Declaration
public virtual float Mu { get; }
Property Value
Type | Description |
---|---|
float |
Methods
Explain(Explanation, BasicStats, int, float, float)
Subclasses should implement this method to explain the score. expl
already contains the score, the name of the class and the doc id, as well
as the term frequency and its explanation; subclasses can add additional
clauses to explain details of their scoring formulae.
The default implementation does nothing.
Declaration
protected override void Explain(Explanation expl, BasicStats stats, int doc, float freq, float docLen)
Parameters
Type | Name | Description |
---|---|---|
Explanation | expl | the explanation to extend with details. |
BasicStats | stats | the corpus level statistics. |
int | doc | the document id. |
float | freq | the term frequency. |
float | docLen | the document length. |
Overrides
GetName()
Returns the name of the LM method. The values of the parameters should be included as well.
Used in ToString()
.Declaration
public override string GetName()
Returns
Type | Description |
---|---|
string |
Overrides
Score(BasicStats, float, float)
Scores the document doc
.
Subclasses must apply their scoring formula in this class.
Declaration
public override float Score(BasicStats stats, float freq, float docLen)
Parameters
Type | Name | Description |
---|---|---|
BasicStats | stats | the corpus level statistics. |
float | freq | the term frequency. |
float | docLen | the document length. |
Returns
Type | Description |
---|---|
float | the score. |