Class BM25Similarity

BM25 Similarity. Introduced in Stephen E. Robertson, Steve Walker, Susan Jones, Micheline Hancock-Beaulieu, and Mike Gatford. Okapi at TREC-3. In Proceedings of the Third Text REtrieval Conference (TREC 1994). Gaithersburg, USA, November 1994.

Note

This API is experimental and might change in incompatible ways in the next release.

Inheritance

object

Similarity

BM25Similarity

Inherited Members

Similarity.Coord(int, int)

Similarity.QueryNorm(float)

object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

Namespace: Lucene.Net.Search.Similarities

Assembly: Lucene.Net.dll

Syntax

public class BM25Similarity : Similarity

Constructors

BM25Similarity()

BM25 with these default values:

k1 = 1.2,
b = 0.75.

Declaration

public BM25Similarity()

BM25Similarity(float, float)

BM25 with the supplied parameter values.

Declaration

public BM25Similarity(float k1, float b)

Parameters

Type	Name	Description
float	k1	Controls non-linear term frequency normalization (saturation).
float	b	Controls to what degree document length normalizes tf values.

Properties

B

Returns the b parameter

Declaration

public virtual float B { get; }

Property Value

Type	Description
float

DiscountOverlaps

Gets or Sets whether overlap tokens (Tokens with 0 position increment) are ignored when computing norm. By default this is true, meaning overlap tokens do not count when computing norms.

Declaration

public virtual bool DiscountOverlaps { get; set; }

Property Value

Type	Description
bool

K1

Returns the k1 parameter

Declaration

public virtual float K1 { get; }

Property Value

Type	Description
float

Methods

AvgFieldLength(CollectionStatistics)

The default implementation computes the average as sumTotalTermFreq / maxDoc, or returns 1 if the index does not store sumTotalTermFreq (Lucene 3.x indexes or any field that omits frequency information).

Declaration

protected virtual float AvgFieldLength(CollectionStatistics collectionStats)

Parameters

Type	Name	Description
CollectionStatistics	collectionStats

Returns

Type	Description
float

ComputeNorm(FieldInvertState)

Computes the normalization value for a field, given the accumulated state of term processing for this field (see FieldInvertState).

Matches in longer fields are less precise, so implementations of this method usually set smaller values when state.Length is large, and larger values when

state.Length

is small.

Note

This API is experimental and might change in incompatible ways in the next release.

Declaration

public override sealed long ComputeNorm(FieldInvertState state)

Parameters

Type	Name	Description
FieldInvertState	state	current processing state for this field

Returns

Type	Description
long	computed norm value

Overrides

ComputeWeight(float, CollectionStatistics, params TermStatistics[])

Compute any collection-level weight (e.g. IDF, average document length, etc) needed for scoring a query.

Declaration

public override sealed Similarity.SimWeight ComputeWeight(float queryBoost, CollectionStatistics collectionStats, params TermStatistics[] termStats)

Parameters

Type	Name	Description
float	queryBoost	the query-time boost.
CollectionStatistics	collectionStats	collection-level statistics, such as the number of tokens in the collection.
TermStatistics[]	termStats	term-level statistics, such as the document frequency of a term across the collection.

Returns

Type	Description
Similarity.SimWeight	Similarity.SimWeight object with the information this Similarity needs to score a query.

Overrides

DecodeNormValue(byte)

The default implementation returns 1 / f² where f is Byte315ToSingle(byte).

Declaration

protected virtual float DecodeNormValue(byte b)

Parameters

Type	Name	Description
byte	b

Returns

Type	Description
float

EncodeNormValue(float, int)

The default implementation encodes boost / sqrt(length) with SingleToByte315(float). This is compatible with Lucene's default implementation. If you change this, then you should change DecodeNormValue(byte) to match.

Declaration

protected virtual byte EncodeNormValue(float boost, int fieldLength)

Parameters

Type	Name	Description
float	boost
int	fieldLength

Returns

Type	Description
byte

GetSimScorer(SimWeight, AtomicReaderContext)

Creates a new Similarity.SimScorer to score matching documents from a segment of the inverted index.

Declaration

public override sealed Similarity.SimScorer GetSimScorer(Similarity.SimWeight stats, AtomicReaderContext context)

Parameters

Type	Name	Description
Similarity.SimWeight	stats
AtomicReaderContext	context	segment of the inverted index to be scored.

Returns

Type	Description
Similarity.SimScorer	Sloppy Similarity.SimScorer for scoring documents across `context`

Overrides

Exceptions

Type	Condition
IOException	if there is a low-level I/O error

Idf(long, long)

Implemented as log(1 + (numDocs - docFreq + 0.5)/(docFreq + 0.5)).

Declaration

protected virtual float Idf(long docFreq, long numDocs)

Parameters

Type	Name	Description
long	docFreq
long	numDocs

Returns

Type	Description
float

IdfExplain(CollectionStatistics, TermStatistics)

Computes a score factor for a simple term and returns an explanation for that score factor.

The default implementation uses:

Idf(docFreq, searcher.MaxDoc);

Note that MaxDoc is used instead of NumDocs because also DocFreq is used, and when the latter is inaccurate, so is MaxDoc, and in the same direction. In addition, MaxDoc is more efficient to compute

Declaration

public virtual Explanation IdfExplain(CollectionStatistics collectionStats, TermStatistics termStats)

Parameters

Type	Name	Description
CollectionStatistics	collectionStats	collection-level statistics
TermStatistics	termStats	term-level statistics for the term

Returns

Type	Description
Explanation	an Explanation object that includes both an idf score factor and an explanation for the term.

IdfExplain(CollectionStatistics, TermStatistics[])

Computes a score factor for a phrase.

The default implementation sums the idf factor for each term in the phrase.

Declaration

public virtual Explanation IdfExplain(CollectionStatistics collectionStats, TermStatistics[] termStats)

Parameters

Type	Name	Description
CollectionStatistics	collectionStats	collection-level statistics
TermStatistics[]	termStats	term-level statistics for the terms in the phrase

Returns

Type	Description
Explanation	an Explanation object that includes both an idf score factor for the phrase and an explanation for each term.

ScorePayload(int, int, int, BytesRef)

The default implementation returns 1

Declaration

protected virtual float ScorePayload(int doc, int start, int end, BytesRef payload)

Parameters

Type	Name	Description
int	doc
int	start
int	end
BytesRef	payload

Returns

Type	Description
float

SloppyFreq(int)

Implemented as 1 / (distance + 1).

Declaration

protected virtual float SloppyFreq(int distance)

Parameters

Type	Name	Description
int	distance

Returns

Type	Description
float

ToString()

Returns a string that represents the current object.

Declaration

public override string ToString()

Returns

Type	Description
string	A string that represents the current object.

Overrides

object.ToString()

Class BM25Similarity

Note

Inheritance

Inherited Members

Namespace: Lucene.Net.Search.Similarities

Assembly: Lucene.Net.dll

Syntax

Constructors

BM25Similarity()

Declaration

BM25Similarity(float, float)

Declaration

Parameters

Properties

B

Declaration

Property Value

See Also

DiscountOverlaps

Declaration

Property Value

K1

Declaration

Property Value

See Also

Methods

AvgFieldLength(CollectionStatistics)

Declaration

Parameters

Returns

ComputeNorm(FieldInvertState)

Note

Declaration

Parameters

Returns

Overrides

ComputeWeight(float, CollectionStatistics, params TermStatistics[])

Declaration

Parameters

Returns

Overrides

DecodeNormValue(byte)

Declaration

Parameters

Returns

EncodeNormValue(float, int)

Declaration

Parameters

Returns

GetSimScorer(SimWeight, AtomicReaderContext)

Declaration

Parameters

Returns

Overrides

Exceptions

Idf(long, long)

Declaration

Parameters

Returns

IdfExplain(CollectionStatistics, TermStatistics)

Declaration

Parameters

Returns

IdfExplain(CollectionStatistics, TermStatistics[])

Declaration

Parameters

Returns

ScorePayload(int, int, int, BytesRef)

Declaration

Parameters

Returns

SloppyFreq(int)

Declaration

Parameters

Returns

ToString()

Declaration

Returns

Overrides