Class BM25Similarity
BM25 Similarity. Introduced in Stephen E. Robertson, Steve Walker,
Susan Jones, Micheline Hancock-Beaulieu, and Mike Gatford. Okapi at TREC-3.
In Proceedings of the Third Text REtrieval Conference (TREC 1994).
Gaithersburg, USA, November 1994.
This is a Lucene.NET EXPERIMENTAL API, use at your own risk
Inheritance
System.Object
BM25Similarity
Inherited Members
System.Object.Equals(System.Object)
System.Object.Equals(System.Object, System.Object)
System.Object.GetHashCode()
System.Object.GetType()
System.Object.MemberwiseClone()
System.Object.ReferenceEquals(System.Object, System.Object)
Assembly: Lucene.Net.dll
Syntax
public class BM25Similarity : Similarity
Constructors
|
Improve this Doc
View Source
BM25Similarity()
BM25 with these default values:
Declaration
|
Improve this Doc
View Source
BM25Similarity(Single, Single)
BM25 with the supplied parameter values.
Declaration
public BM25Similarity(float k1, float b)
Parameters
Type |
Name |
Description |
System.Single |
k1 |
Controls non-linear term frequency normalization (saturation).
|
System.Single |
b |
Controls to what degree document length normalizes tf values.
|
Properties
|
Improve this Doc
View Source
B
Declaration
public virtual float B { get; }
Property Value
Type |
Description |
System.Single |
|
See Also
|
Improve this Doc
View Source
DiscountOverlaps
Gets or Sets whether overlap tokens (Tokens with 0 position increment) are
ignored when computing norm. By default this is true, meaning overlap
tokens do not count when computing norms.
Declaration
public virtual bool DiscountOverlaps { get; set; }
Property Value
Type |
Description |
System.Boolean |
|
|
Improve this Doc
View Source
K1
Declaration
public virtual float K1 { get; }
Property Value
Type |
Description |
System.Single |
|
See Also
Methods
|
Improve this Doc
View Source
AvgFieldLength(CollectionStatistics)
The default implementation computes the average as sumTotalTermFreq / maxDoc
,
or returns 1
if the index does not store sumTotalTermFreq (Lucene 3.x indexes
or any field that omits frequency information).
Declaration
protected virtual float AvgFieldLength(CollectionStatistics collectionStats)
Parameters
Returns
Type |
Description |
System.Single |
|
|
Improve this Doc
View Source
ComputeNorm(FieldInvertState)
Declaration
public override sealed long ComputeNorm(FieldInvertState state)
Parameters
Returns
Type |
Description |
System.Int64 |
|
Overrides
|
Improve this Doc
View Source
ComputeWeight(Single, CollectionStatistics, TermStatistics[])
Declaration
public override sealed Similarity.SimWeight ComputeWeight(float queryBoost, CollectionStatistics collectionStats, params TermStatistics[] termStats)
Parameters
Returns
Overrides
|
Improve this Doc
View Source
DecodeNormValue(Byte)
Declaration
protected virtual float DecodeNormValue(byte b)
Parameters
Type |
Name |
Description |
System.Byte |
b |
|
Returns
Type |
Description |
System.Single |
|
|
Improve this Doc
View Source
EncodeNormValue(Single, Int32)
Declaration
protected virtual byte EncodeNormValue(float boost, int fieldLength)
Parameters
Type |
Name |
Description |
System.Single |
boost |
|
System.Int32 |
fieldLength |
|
Returns
Type |
Description |
System.Byte |
|
|
Improve this Doc
View Source
GetSimScorer(Similarity.SimWeight, AtomicReaderContext)
Declaration
public override sealed Similarity.SimScorer GetSimScorer(Similarity.SimWeight stats, AtomicReaderContext context)
Parameters
Returns
Overrides
|
Improve this Doc
View Source
Idf(Int64, Int64)
Implemented as log(1 + (numDocs - docFreq + 0.5)/(docFreq + 0.5))
.
Declaration
protected virtual float Idf(long docFreq, long numDocs)
Parameters
Type |
Name |
Description |
System.Int64 |
docFreq |
|
System.Int64 |
numDocs |
|
Returns
Type |
Description |
System.Single |
|
|
Improve this Doc
View Source
IdfExplain(CollectionStatistics, TermStatistics)
Computes a score factor for a simple term and returns an explanation
for that score factor.
The default implementation uses:
Idf(docFreq, searcher.MaxDoc);
Note that MaxDoc is used instead of
NumDocs because also
DocFreq is used, and when the latter
is inaccurate, so is MaxDoc, and in the same direction.
In addition, MaxDoc is more efficient to compute
Declaration
public virtual Explanation IdfExplain(CollectionStatistics collectionStats, TermStatistics termStats)
Parameters
Returns
Type |
Description |
Explanation |
an Explanation object that includes both an idf score factor
and an explanation for the term.
|
|
Improve this Doc
View Source
IdfExplain(CollectionStatistics, TermStatistics[])
Computes a score factor for a phrase.
The default implementation sums the idf factor for
each term in the phrase.
Declaration
public virtual Explanation IdfExplain(CollectionStatistics collectionStats, TermStatistics[] termStats)
Parameters
Returns
Type |
Description |
Explanation |
an Explanation object that includes both an idf
score factor for the phrase and an explanation
for each term.
|
|
Improve this Doc
View Source
ScorePayload(Int32, Int32, Int32, BytesRef)
The default implementation returns 1
Declaration
protected virtual float ScorePayload(int doc, int start, int end, BytesRef payload)
Parameters
Type |
Name |
Description |
System.Int32 |
doc |
|
System.Int32 |
start |
|
System.Int32 |
end |
|
BytesRef |
payload |
|
Returns
Type |
Description |
System.Single |
|
|
Improve this Doc
View Source
SloppyFreq(Int32)
Implemented as 1 / (distance + 1)
.
Declaration
protected virtual float SloppyFreq(int distance)
Parameters
Type |
Name |
Description |
System.Int32 |
distance |
|
Returns
Type |
Description |
System.Single |
|
|
Improve this Doc
View Source
ToString()
Declaration
public override string ToString()
Returns
Type |
Description |
System.String |
|
Overrides
System.Object.ToString()
Extension Methods