Class DFRSimilarity
Implements the divergence from randomness (DFR) framework introduced in Gianni Amati and Cornelis Joost Van Rijsbergen. 2002. Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Trans. Inf. Syst. 20, 4 (October 2002), 357-389.
The DFR scoring formula is composed of three separate components: the basic model, the aftereffect and an additional normalization component, represented by the classes BasicModel, AfterEffect and Normalization, respectively. The names of these classes were chosen to match the names of their counterparts in the Terrier IR engine.
To construct a DFRSimilarity, you must specify the implementations for all three components of DFR:
Component | Implementations |
---|---|
BasicModel: Basic model of information content: |
|
AfterEffect: First normalization of information gain: |
|
Normalization: Second (length) normalization: |
|
Note that qtf, the multiplicity of term-occurrence in the query, is not handled by this implementation.
Note
This API is experimental and might change in incompatible ways in the next release.
Inherited Members
Namespace: Lucene.Net.Search.Similarities
Assembly: Lucene.Net.dll
Syntax
public class DFRSimilarity : SimilarityBase
Constructors
DFRSimilarity(BasicModel, AfterEffect, Normalization)
Creates DFRSimilarity from the three components.
Note thatnull
values are not allowed:
if you want no normalization or after-effect, instead pass
Normalization.NoNormalization or AfterEffect.NoAfterEffect respectively.
Declaration
public DFRSimilarity(BasicModel basicModel, AfterEffect afterEffect, Normalization normalization)
Parameters
Type | Name | Description |
---|---|---|
BasicModel | basicModel | Basic model of information content |
AfterEffect | afterEffect | First normalization of information gain |
Normalization | normalization | Second (length) normalization |
Exceptions
Type | Condition |
---|---|
ArgumentNullException |
|
See Also
Fields
m_afterEffect
The first normalization of the information content.
Declaration
protected readonly AfterEffect m_afterEffect
Field Value
Type | Description |
---|---|
AfterEffect |
See Also
m_basicModel
The basic model for information content.
Declaration
protected readonly BasicModel m_basicModel
Field Value
Type | Description |
---|---|
BasicModel |
See Also
m_normalization
The term frequency normalization.
Declaration
protected readonly Normalization m_normalization
Field Value
Type | Description |
---|---|
Normalization |
See Also
Properties
AfterEffect
Returns the first normalization
Declaration
public virtual AfterEffect AfterEffect { get; }
Property Value
Type | Description |
---|---|
AfterEffect |
See Also
BasicModel
Returns the basic model of information content
Declaration
public virtual BasicModel BasicModel { get; }
Property Value
Type | Description |
---|---|
BasicModel |
See Also
Normalization
Returns the second normalization
Declaration
public virtual Normalization Normalization { get; }
Property Value
Type | Description |
---|---|
Normalization |
See Also
Methods
Explain(Explanation, BasicStats, int, float, float)
Subclasses should implement this method to explain the score. expl
already contains the score, the name of the class and the doc id, as well
as the term frequency and its explanation; subclasses can add additional
clauses to explain details of their scoring formulae.
The default implementation does nothing.
Declaration
protected override void Explain(Explanation expl, BasicStats stats, int doc, float freq, float docLen)
Parameters
Type | Name | Description |
---|---|---|
Explanation | expl | the explanation to extend with details. |
BasicStats | stats | the corpus level statistics. |
int | doc | the document id. |
float | freq | the term frequency. |
float | docLen | the document length. |
Overrides
See Also
Score(BasicStats, float, float)
Scores the document doc
.
Subclasses must apply their scoring formula in this class.
Declaration
public override float Score(BasicStats stats, float freq, float docLen)
Parameters
Type | Name | Description |
---|---|---|
BasicStats | stats | the corpus level statistics. |
float | freq | the term frequency. |
float | docLen | the document length. |
Returns
Type | Description |
---|---|
float | the score. |
Overrides
See Also
ToString()
Subclasses must override this method to return the name of the Similarity and preferably the values of parameters (if any) as well.
Declaration
public override string ToString()
Returns
Type | Description |
---|---|
string |