Class BM25Similarity
BM25 Similarity. Introduced in Stephen E. Robertson, Steve Walker, Susan Jones, Micheline Hancock-Beaulieu, and Mike Gatford. Okapi at TREC-3. In Proceedings of the Third Text REtrieval Conference (TREC 1994). Gaithersburg, USA, November 1994.
Note
This API is experimental and might change in incompatible ways in the next release.
Inherited Members
Namespace: Lucene.Net.Search.Similarities
Assembly: Lucene.Net.dll
Syntax
public class BM25Similarity : Similarity
Constructors
| Improve this Doc View SourceBM25Similarity()
BM25 with these default values:
k1 = 1.2,b = 0.75.
Declaration
public BM25Similarity()
BM25Similarity(Single, Single)
BM25 with the supplied parameter values.
Declaration
public BM25Similarity(float k1, float b)
Parameters
| Type | Name | Description |
|---|---|---|
| System.Single | k1 | Controls non-linear term frequency normalization (saturation). |
| System.Single | b | Controls to what degree document length normalizes tf values. |
Properties
| Improve this Doc View SourceB
Returns the b parameter
Declaration
public virtual float B { get; }
Property Value
| Type | Description |
|---|---|
| System.Single |
See Also
| Improve this Doc View SourceDiscountOverlaps
Gets or Sets whether overlap tokens (Tokens with 0 position increment) are ignored when computing norm. By default this is true, meaning overlap tokens do not count when computing norms.
Declaration
public virtual bool DiscountOverlaps { get; set; }
Property Value
| Type | Description |
|---|---|
| System.Boolean |
K1
Returns the k1 parameter
Declaration
public virtual float K1 { get; }
Property Value
| Type | Description |
|---|---|
| System.Single |
See Also
Methods
| Improve this Doc View SourceAvgFieldLength(CollectionStatistics)
The default implementation computes the average as sumTotalTermFreq / maxDoc,
or returns 1 if the index does not store sumTotalTermFreq (Lucene 3.x indexes
or any field that omits frequency information).
Declaration
protected virtual float AvgFieldLength(CollectionStatistics collectionStats)
Parameters
| Type | Name | Description |
|---|---|---|
| CollectionStatistics | collectionStats |
Returns
| Type | Description |
|---|---|
| System.Single |
ComputeNorm(FieldInvertState)
Declaration
public sealed override long ComputeNorm(FieldInvertState state)
Parameters
| Type | Name | Description |
|---|---|---|
| FieldInvertState | state |
Returns
| Type | Description |
|---|---|
| System.Int64 |
Overrides
| Improve this Doc View SourceComputeWeight(Single, CollectionStatistics, TermStatistics[])
Declaration
public sealed override Similarity.SimWeight ComputeWeight(float queryBoost, CollectionStatistics collectionStats, params TermStatistics[] termStats)
Parameters
| Type | Name | Description |
|---|---|---|
| System.Single | queryBoost | |
| CollectionStatistics | collectionStats | |
| TermStatistics[] | termStats |
Returns
| Type | Description |
|---|---|
| Similarity.SimWeight |
Overrides
| Improve this Doc View SourceDecodeNormValue(Byte)
The default implementation returns 1 / f2
where f is Byte315ToSingle(Byte).
Declaration
protected virtual float DecodeNormValue(byte b)
Parameters
| Type | Name | Description |
|---|---|---|
| System.Byte | b |
Returns
| Type | Description |
|---|---|
| System.Single |
EncodeNormValue(Single, Int32)
The default implementation encodes boost / sqrt(length)
with SingleToByte315(Single). This is compatible with
Lucene's default implementation. If you change this, then you should
change DecodeNormValue(Byte) to match.
Declaration
protected virtual byte EncodeNormValue(float boost, int fieldLength)
Parameters
| Type | Name | Description |
|---|---|---|
| System.Single | boost | |
| System.Int32 | fieldLength |
Returns
| Type | Description |
|---|---|
| System.Byte |
GetSimScorer(Similarity.SimWeight, AtomicReaderContext)
Declaration
public sealed override Similarity.SimScorer GetSimScorer(Similarity.SimWeight stats, AtomicReaderContext context)
Parameters
| Type | Name | Description |
|---|---|---|
| Similarity.SimWeight | stats | |
| AtomicReaderContext | context |
Returns
| Type | Description |
|---|---|
| Similarity.SimScorer |
Overrides
| Improve this Doc View SourceIdf(Int64, Int64)
Implemented as log(1 + (numDocs - docFreq + 0.5)/(docFreq + 0.5)).
Declaration
protected virtual float Idf(long docFreq, long numDocs)
Parameters
| Type | Name | Description |
|---|---|---|
| System.Int64 | docFreq | |
| System.Int64 | numDocs |
Returns
| Type | Description |
|---|---|
| System.Single |
IdfExplain(CollectionStatistics, TermStatistics)
Computes a score factor for a simple term and returns an explanation for that score factor.
The default implementation uses:
Idf(docFreq, searcher.MaxDoc);
Note that MaxDoc is used instead of NumDocs because also DocFreq is used, and when the latter is inaccurate, so is MaxDoc, and in the same direction. In addition, MaxDoc is more efficient to compute
Declaration
public virtual Explanation IdfExplain(CollectionStatistics collectionStats, TermStatistics termStats)
Parameters
| Type | Name | Description |
|---|---|---|
| CollectionStatistics | collectionStats | collection-level statistics |
| TermStatistics | termStats | term-level statistics for the term |
Returns
| Type | Description |
|---|---|
| Explanation | an Explanation object that includes both an idf score factor and an explanation for the term. |
IdfExplain(CollectionStatistics, TermStatistics[])
Computes a score factor for a phrase.
The default implementation sums the idf factor for each term in the phrase.
Declaration
public virtual Explanation IdfExplain(CollectionStatistics collectionStats, TermStatistics[] termStats)
Parameters
| Type | Name | Description |
|---|---|---|
| CollectionStatistics | collectionStats | collection-level statistics |
| TermStatistics[] | termStats | term-level statistics for the terms in the phrase |
Returns
| Type | Description |
|---|---|
| Explanation | an Explanation object that includes both an idf score factor for the phrase and an explanation for each term. |
ScorePayload(Int32, Int32, Int32, BytesRef)
The default implementation returns 1
Declaration
protected virtual float ScorePayload(int doc, int start, int end, BytesRef payload)
Parameters
| Type | Name | Description |
|---|---|---|
| System.Int32 | doc | |
| System.Int32 | start | |
| System.Int32 | end | |
| BytesRef | payload |
Returns
| Type | Description |
|---|---|
| System.Single |
SloppyFreq(Int32)
Implemented as 1 / (distance + 1).
Declaration
protected virtual float SloppyFreq(int distance)
Parameters
| Type | Name | Description |
|---|---|---|
| System.Int32 | distance |
Returns
| Type | Description |
|---|---|
| System.Single |
ToString()
Declaration
public override string ToString()
Returns
| Type | Description |
|---|---|
| System.String |