Class IBSimilarity
Provides a framework for the family of information-based models, as described in StÉphane Clinchant and Eric Gaussier. 2010. Information-based models for ad hoc IR. In Proceeding of the 33rd international ACM SIGIR conference on Research and development in information retrieval (SIGIR '10). ACM, New York, NY, USA, 234-241.
The retrieval function is of the form RSV(q, d) = ∑ -xqw log Prob(Xw >= tdw | λw), where
- xqw is the query boost;
- Xw is a random variable that counts the occurrences of word w;
- tdw is the normalized term frequency;
- λw is a parameter.
The framework described in the paper has many similarities to the DFR framework (see DFRSimilarity). It is possible that the two Similarities will be merged at one point.
To construct an IBSimilarity, you must specify the implementations for all three components of the Information-Based model.
Component | Implementations |
---|---|
Distribution: Probabilistic distribution used to model term occurrence |
|
Lambda: λw parameter of the probability distribution | |
Normalization: Term frequency normalization | Any supported DFR normalization (listed in DFRSimilarity) |
Note
This API is experimental and might change in incompatible ways in the next release.
Inherited Members
Namespace: Lucene.Net.Search.Similarities
Assembly: Lucene.Net.dll
Syntax
public class IBSimilarity : SimilarityBase
Constructors
IBSimilarity(Distribution, Lambda, Normalization)
Creates IBSimilarity from the three components.
Note thatnull
values are not allowed:
if you want no normalization, instead pass
Normalization.NoNormalization.
Declaration
public IBSimilarity(Distribution distribution, Lambda lambda, Normalization normalization)
Parameters
Type | Name | Description |
---|---|---|
Distribution | distribution | probabilistic distribution modeling term occurrence |
Lambda | lambda | distribution's λw parameter |
Normalization | normalization | term frequency normalization |
See Also
Fields
m_distribution
The probabilistic distribution used to model term occurrence.
Declaration
protected readonly Distribution m_distribution
Field Value
Type | Description |
---|---|
Distribution |
See Also
m_lambda
The lambda (λw) parameter.
Declaration
protected readonly Lambda m_lambda
Field Value
Type | Description |
---|---|
Lambda |
See Also
m_normalization
The term frequency normalization.
Declaration
protected readonly Normalization m_normalization
Field Value
Type | Description |
---|---|
Normalization |
See Also
Properties
Distribution
Returns the distribution
Declaration
public virtual Distribution Distribution { get; }
Property Value
Type | Description |
---|---|
Distribution |
See Also
Lambda
Returns the distribution's lambda parameter
Declaration
public virtual Lambda Lambda { get; }
Property Value
Type | Description |
---|---|
Lambda |
See Also
Normalization
Returns the term frequency normalization
Declaration
public virtual Normalization Normalization { get; }
Property Value
Type | Description |
---|---|
Normalization |
See Also
Methods
Explain(Explanation, BasicStats, int, float, float)
Subclasses should implement this method to explain the score. expl
already contains the score, the name of the class and the doc id, as well
as the term frequency and its explanation; subclasses can add additional
clauses to explain details of their scoring formulae.
The default implementation does nothing.
Declaration
protected override void Explain(Explanation expl, BasicStats stats, int doc, float freq, float docLen)
Parameters
Type | Name | Description |
---|---|---|
Explanation | expl | the explanation to extend with details. |
BasicStats | stats | the corpus level statistics. |
int | doc | the document id. |
float | freq | the term frequency. |
float | docLen | the document length. |
Overrides
See Also
Score(BasicStats, float, float)
Scores the document doc
.
Subclasses must apply their scoring formula in this class.
Declaration
public override float Score(BasicStats stats, float freq, float docLen)
Parameters
Type | Name | Description |
---|---|---|
BasicStats | stats | the corpus level statistics. |
float | freq | the term frequency. |
float | docLen | the document length. |
Returns
Type | Description |
---|---|
float | the score. |
Overrides
See Also
ToString()
The name of IB methods follow the pattern
IB <distribution> <lambda><normalization>
. The name of the
distribution is the same as in the original paper; for the names of lambda
parameters, refer to the doc of the Lambda classes.
Declaration
public override string ToString()
Returns
Type | Description |
---|---|
string |