Class DefaultSimilarity

Expert: Default scoring implementation which encodes (EncodeNormValue(Single)) norm values as a single byte before being stored. At search time, the norm byte value is read from the index Directory and decoded (DecodeNormValue(Int64)) back to a float norm value. this encoding/decoding, while reducing index size, comes with the price of precision loss - it is not guaranteed that Decode(Encode(x)) = x. For instance, Decode(Encode(0.89)) = 0.75.

Compression of norm values to a single byte saves memory at search time, because once a field is referenced at search time, its norms - for all documents - are maintained in memory.

The rationale supporting such lossy compression of norm values is that given the difficulty (and inaccuracy) of users to express their true information need by a query, only big differences matter.

Last, note that search time is too late to modify this norm part of scoring, e.g. by using a different Similarity for search.

Inheritance

System.Object

Similarity

TFIDFSimilarity

DefaultSimilarity

Inherited Members

System.Object.Equals(System.Object)

System.Object.Equals(System.Object, System.Object)

System.Object.GetHashCode()

System.Object.GetType()

System.Object.MemberwiseClone()

System.Object.ReferenceEquals(System.Object, System.Object)

Namespace: Lucene.Net.Search.Similarities

Assembly: Lucene.Net.dll

Syntax

public class DefaultSimilarity : TFIDFSimilarity

Constructors

| Improve this Doc View Source

DefaultSimilarity()

Sole constructor: parameter-free

Declaration

public DefaultSimilarity()

Fields

| Improve this Doc View Source

m_discountOverlaps

True if overlap tokens (tokens with a position of increment of zero) are discounted from the document's length.

Declaration

protected bool m_discountOverlaps

Field Value

Type	Description
System.Boolean

Properties

| Improve this Doc View Source

DiscountOverlaps

Determines whether overlap tokens (Tokens with 0 position increment) are ignored when computing norm. By default this is true, meaning overlap tokens do not count when computing norms.

This is a Lucene.NET EXPERIMENTAL API, use at your own risk

Declaration

public virtual bool DiscountOverlaps { get; set; }

Property Value

Type	Description
System.Boolean

Methods

| Improve this Doc View Source

Coord(Int32, Int32)

Implemented as overlap / maxOverlap.

Declaration

public override float Coord(int overlap, int maxOverlap)

Parameters

Type	Name	Description
System.Int32	overlap
System.Int32	maxOverlap

Returns

Type	Description
System.Single

Overrides

TFIDFSimilarity.Coord(Int32, Int32)

| Improve this Doc View Source

DecodeNormValue(Int64)

Decodes the norm value, assuming it is a single byte.

Declaration

public override sealed float DecodeNormValue(long norm)

Parameters

Type	Name	Description
System.Int64	norm

Returns

Type	Description
System.Single

Overrides

TFIDFSimilarity.DecodeNormValue(Int64)

EncodeNormValue(Single)

Encodes a normalization factor for storage in an index.

The encoding uses a three-bit mantissa, a five-bit exponent, and the zero-exponent point at 15, thus representing values from around 7x10^9 to 2x10^-9 with about one significant decimal digit of accuracy. Zero is also represented. Negative numbers are rounded up to zero. Values too large to represent are rounded down to the largest representable value. Positive values too small to represent are rounded up to the smallest positive representable value.

Declaration

public override sealed long EncodeNormValue(float f)

Parameters

Type	Name	Description
System.Single	f

Returns

Type	Description
System.Int64

Overrides

TFIDFSimilarity.EncodeNormValue(Single)

Idf(Int64, Int64)

Implemented as log(numDocs/(docFreq+1)) + 1.

Declaration

public override float Idf(long docFreq, long numDocs)

Parameters

Type	Name	Description
System.Int64	docFreq
System.Int64	numDocs

Returns

Type	Description
System.Single

Overrides

TFIDFSimilarity.Idf(Int64, Int64)

| Improve this Doc View Source

LengthNorm(FieldInvertState)

Implemented as state.Boost * LengthNorm(numTerms), where numTerms is Length if DiscountOverlaps is false, else it's Length - NumOverlap.

This is a Lucene.NET EXPERIMENTAL API, use at your own risk

Declaration

public override float LengthNorm(FieldInvertState state)

Parameters

Type	Name	Description
FieldInvertState	state

Returns

Type	Description
System.Single

Overrides

| Improve this Doc View Source

QueryNorm(Single)

Implemented as 1/sqrt(sumOfSquaredWeights).

Declaration

public override float QueryNorm(float sumOfSquaredWeights)

Parameters

Type	Name	Description
System.Single	sumOfSquaredWeights

Returns

Type	Description
System.Single

Overrides

TFIDFSimilarity.QueryNorm(Single)

| Improve this Doc View Source

ScorePayload(Int32, Int32, Int32, BytesRef)

The default implementation returns 1

Declaration

public override float ScorePayload(int doc, int start, int end, BytesRef payload)

Parameters

Type	Name	Description
System.Int32	doc
System.Int32	start
System.Int32	end
BytesRef	payload

Returns

Type	Description
System.Single

Overrides

| Improve this Doc View Source

SloppyFreq(Int32)

Implemented as 1 / (distance + 1).

Declaration

public override float SloppyFreq(int distance)

Parameters

Type	Name	Description
System.Int32	distance

Returns

Type	Description
System.Single

Overrides

TFIDFSimilarity.SloppyFreq(Int32)

| Improve this Doc View Source

Tf(Single)

Implemented as Math.Sqrt(freq).

Declaration

public override float Tf(float freq)

Parameters

Type	Name	Description
System.Single	freq

Returns

Type	Description
System.Single

Overrides

TFIDFSimilarity.Tf(Single)

| Improve this Doc View Source

ToString()

Declaration

public override string ToString()

Returns

Type	Description
System.String

Overrides

System.Object.ToString()

Class DefaultSimilarity

Inheritance

Inherited Members

Namespace: Lucene.Net.Search.Similarities

Assembly: Lucene.Net.dll

Syntax

Constructors

DefaultSimilarity()

Declaration

Fields

m_discountOverlaps

Declaration

Field Value

Properties

DiscountOverlaps

Declaration

Property Value

See Also

Methods

Coord(Int32, Int32)

Declaration

Parameters

Returns

Overrides

DecodeNormValue(Int64)

Declaration

Parameters

Returns

Overrides

See Also

EncodeNormValue(Single)

Declaration

Parameters

Returns

Overrides

See Also

Idf(Int64, Int64)

Declaration

Parameters

Returns

Overrides

LengthNorm(FieldInvertState)

Declaration

Parameters

Returns

Overrides

QueryNorm(Single)

Declaration

Parameters

Returns

Overrides

ScorePayload(Int32, Int32, Int32, BytesRef)

Declaration

Parameters

Returns

Overrides

SloppyFreq(Int32)

Declaration

Parameters

Returns

Overrides

Tf(Single)

Declaration

Parameters

Returns

Overrides

ToString()

Declaration

Returns

Overrides