Class BM25Similarity
BM25 Similarity. Introduced in Stephen E. Robertson, Steve Walker, Susan Jones, Micheline Hancock-Beaulieu, and Mike Gatford. Okapi at TREC-3. In Proceedings of the Third Text REtrieval Conference (TREC 1994). Gaithersburg, USA, November 1994.
Note
This API is experimental and might change in incompatible ways in the next release.
Inherited Members
Namespace: Lucene.Net.Search.Similarities
Assembly: Lucene.Net.dll
Syntax
public class BM25Similarity : Similarity
Constructors
| Improve this Doc View SourceBM25Similarity()
BM25 with these default values:
k1 = 1.2
,b = 0.75
.
Declaration
public BM25Similarity()
BM25Similarity(Single, Single)
BM25 with the supplied parameter values.
Declaration
public BM25Similarity(float k1, float b)
Parameters
Type | Name | Description |
---|---|---|
System.Single | k1 | Controls non-linear term frequency normalization (saturation). |
System.Single | b | Controls to what degree document length normalizes tf values. |
Properties
| Improve this Doc View SourceB
Returns the b
parameter
Declaration
public virtual float B { get; }
Property Value
Type | Description |
---|---|
System.Single |
See Also
| Improve this Doc View SourceDiscountOverlaps
Gets or Sets whether overlap tokens (Tokens with 0 position increment) are ignored when computing norm. By default this is true, meaning overlap tokens do not count when computing norms.
Declaration
public virtual bool DiscountOverlaps { get; set; }
Property Value
Type | Description |
---|---|
System.Boolean |
K1
Returns the k1
parameter
Declaration
public virtual float K1 { get; }
Property Value
Type | Description |
---|---|
System.Single |
See Also
Methods
| Improve this Doc View SourceAvgFieldLength(CollectionStatistics)
The default implementation computes the average as sumTotalTermFreq / maxDoc
,
or returns 1
if the index does not store sumTotalTermFreq (Lucene 3.x indexes
or any field that omits frequency information).
Declaration
protected virtual float AvgFieldLength(CollectionStatistics collectionStats)
Parameters
Type | Name | Description |
---|---|---|
CollectionStatistics | collectionStats |
Returns
Type | Description |
---|---|
System.Single |
ComputeNorm(FieldInvertState)
Declaration
public sealed override long ComputeNorm(FieldInvertState state)
Parameters
Type | Name | Description |
---|---|---|
FieldInvertState | state |
Returns
Type | Description |
---|---|
System.Int64 |
Overrides
| Improve this Doc View SourceComputeWeight(Single, CollectionStatistics, TermStatistics[])
Declaration
public sealed override Similarity.SimWeight ComputeWeight(float queryBoost, CollectionStatistics collectionStats, params TermStatistics[] termStats)
Parameters
Type | Name | Description |
---|---|---|
System.Single | queryBoost | |
CollectionStatistics | collectionStats | |
TermStatistics[] | termStats |
Returns
Type | Description |
---|---|
Similarity.SimWeight |
Overrides
| Improve this Doc View SourceDecodeNormValue(Byte)
The default implementation returns 1 / f2
where f
is Byte315ToSingle(Byte).
Declaration
protected virtual float DecodeNormValue(byte b)
Parameters
Type | Name | Description |
---|---|---|
System.Byte | b |
Returns
Type | Description |
---|---|
System.Single |
EncodeNormValue(Single, Int32)
The default implementation encodes boost / sqrt(length)
with SingleToByte315(Single). This is compatible with
Lucene's default implementation. If you change this, then you should
change DecodeNormValue(Byte) to match.
Declaration
protected virtual byte EncodeNormValue(float boost, int fieldLength)
Parameters
Type | Name | Description |
---|---|---|
System.Single | boost | |
System.Int32 | fieldLength |
Returns
Type | Description |
---|---|
System.Byte |
GetSimScorer(Similarity.SimWeight, AtomicReaderContext)
Declaration
public sealed override Similarity.SimScorer GetSimScorer(Similarity.SimWeight stats, AtomicReaderContext context)
Parameters
Type | Name | Description |
---|---|---|
Similarity.SimWeight | stats | |
AtomicReaderContext | context |
Returns
Type | Description |
---|---|
Similarity.SimScorer |
Overrides
| Improve this Doc View SourceIdf(Int64, Int64)
Implemented as log(1 + (numDocs - docFreq + 0.5)/(docFreq + 0.5))
.
Declaration
protected virtual float Idf(long docFreq, long numDocs)
Parameters
Type | Name | Description |
---|---|---|
System.Int64 | docFreq | |
System.Int64 | numDocs |
Returns
Type | Description |
---|---|
System.Single |
IdfExplain(CollectionStatistics, TermStatistics)
Computes a score factor for a simple term and returns an explanation for that score factor.
The default implementation uses:
Idf(docFreq, searcher.MaxDoc);
Note that MaxDoc is used instead of NumDocs because also DocFreq is used, and when the latter is inaccurate, so is MaxDoc, and in the same direction. In addition, MaxDoc is more efficient to compute
Declaration
public virtual Explanation IdfExplain(CollectionStatistics collectionStats, TermStatistics termStats)
Parameters
Type | Name | Description |
---|---|---|
CollectionStatistics | collectionStats | collection-level statistics |
TermStatistics | termStats | term-level statistics for the term |
Returns
Type | Description |
---|---|
Explanation | an Explanation object that includes both an idf score factor and an explanation for the term. |
IdfExplain(CollectionStatistics, TermStatistics[])
Computes a score factor for a phrase.
The default implementation sums the idf factor for each term in the phrase.
Declaration
public virtual Explanation IdfExplain(CollectionStatistics collectionStats, TermStatistics[] termStats)
Parameters
Type | Name | Description |
---|---|---|
CollectionStatistics | collectionStats | collection-level statistics |
TermStatistics[] | termStats | term-level statistics for the terms in the phrase |
Returns
Type | Description |
---|---|
Explanation | an Explanation object that includes both an idf score factor for the phrase and an explanation for each term. |
ScorePayload(Int32, Int32, Int32, BytesRef)
The default implementation returns 1
Declaration
protected virtual float ScorePayload(int doc, int start, int end, BytesRef payload)
Parameters
Type | Name | Description |
---|---|---|
System.Int32 | doc | |
System.Int32 | start | |
System.Int32 | end | |
BytesRef | payload |
Returns
Type | Description |
---|---|
System.Single |
SloppyFreq(Int32)
Implemented as 1 / (distance + 1)
.
Declaration
protected virtual float SloppyFreq(int distance)
Parameters
Type | Name | Description |
---|---|---|
System.Int32 | distance |
Returns
Type | Description |
---|---|
System.Single |
ToString()
Declaration
public override string ToString()
Returns
Type | Description |
---|---|
System.String |