• API

    Show / Hide Table of Contents

    Class DefaultSimilarity

    Expert: Default scoring implementation which encodes (EncodeNormValue(Single)) norm values as a single byte before being stored. At search time, the norm byte value is read from the index Directory and decoded (DecodeNormValue(Int64)) back to a float norm value. this encoding/decoding, while reducing index size, comes with the price of precision loss - it is not guaranteed that Decode(Encode(x)) = x. For instance, Decode(Encode(0.89)) = 0.75.

    Compression of norm values to a single byte saves memory at search time, because once a field is referenced at search time, its norms - for all documents - are maintained in memory.

    The rationale supporting such lossy compression of norm values is that given the difficulty (and inaccuracy) of users to express their true information need by a query, only big differences matter.

    Last, note that search time is too late to modify this norm part of scoring, e.g. by using a different Similarity for search.

    Inheritance
    System.Object
    Similarity
    TFIDFSimilarity
    DefaultSimilarity
    Inherited Members
    TFIDFSimilarity.IdfExplain(CollectionStatistics, TermStatistics)
    TFIDFSimilarity.IdfExplain(CollectionStatistics, TermStatistics[])
    TFIDFSimilarity.ComputeNorm(FieldInvertState)
    TFIDFSimilarity.ComputeWeight(Single, CollectionStatistics, TermStatistics[])
    TFIDFSimilarity.GetSimScorer(Similarity.SimWeight, AtomicReaderContext)
    System.Object.Equals(System.Object)
    System.Object.Equals(System.Object, System.Object)
    System.Object.GetHashCode()
    System.Object.GetType()
    System.Object.MemberwiseClone()
    System.Object.ReferenceEquals(System.Object, System.Object)
    Namespace: Lucene.Net.Search.Similarities
    Assembly: Lucene.Net.dll
    Syntax
    public class DefaultSimilarity : TFIDFSimilarity

    Constructors

    | Improve this Doc View Source

    DefaultSimilarity()

    Sole constructor: parameter-free

    Declaration
    public DefaultSimilarity()

    Fields

    | Improve this Doc View Source

    m_discountOverlaps

    True if overlap tokens (tokens with a position of increment of zero) are discounted from the document's length.

    Declaration
    protected bool m_discountOverlaps
    Field Value
    Type Description
    System.Boolean

    Properties

    | Improve this Doc View Source

    DiscountOverlaps

    Determines whether overlap tokens (Tokens with 0 position increment) are ignored when computing norm. By default this is true, meaning overlap tokens do not count when computing norms.

    This is a Lucene.NET EXPERIMENTAL API, use at your own risk
    Declaration
    public virtual bool DiscountOverlaps { get; set; }
    Property Value
    Type Description
    System.Boolean
    See Also
    ComputeNorm(FieldInvertState)

    Methods

    | Improve this Doc View Source

    Coord(Int32, Int32)

    Implemented as overlap / maxOverlap.

    Declaration
    public override float Coord(int overlap, int maxOverlap)
    Parameters
    Type Name Description
    System.Int32 overlap
    System.Int32 maxOverlap
    Returns
    Type Description
    System.Single
    Overrides
    TFIDFSimilarity.Coord(Int32, Int32)
    | Improve this Doc View Source

    DecodeNormValue(Int64)

    Decodes the norm value, assuming it is a single byte.

    Declaration
    public override sealed float DecodeNormValue(long norm)
    Parameters
    Type Name Description
    System.Int64 norm
    Returns
    Type Description
    System.Single
    Overrides
    TFIDFSimilarity.DecodeNormValue(Int64)
    See Also
    EncodeNormValue(Single)
    | Improve this Doc View Source

    EncodeNormValue(Single)

    Encodes a normalization factor for storage in an index.

    The encoding uses a three-bit mantissa, a five-bit exponent, and the zero-exponent point at 15, thus representing values from around 7x10^9 to 2x10^-9 with about one significant decimal digit of accuracy. Zero is also represented. Negative numbers are rounded up to zero. Values too large to represent are rounded down to the largest representable value. Positive values too small to represent are rounded up to the smallest positive representable value.

    Declaration
    public override sealed long EncodeNormValue(float f)
    Parameters
    Type Name Description
    System.Single f
    Returns
    Type Description
    System.Int64
    Overrides
    TFIDFSimilarity.EncodeNormValue(Single)
    See Also
    Boost
    SmallSingle
    | Improve this Doc View Source

    Idf(Int64, Int64)

    Implemented as log(numDocs/(docFreq+1)) + 1.

    Declaration
    public override float Idf(long docFreq, long numDocs)
    Parameters
    Type Name Description
    System.Int64 docFreq
    System.Int64 numDocs
    Returns
    Type Description
    System.Single
    Overrides
    TFIDFSimilarity.Idf(Int64, Int64)
    | Improve this Doc View Source

    LengthNorm(FieldInvertState)

    Implemented as state.Boost * LengthNorm(numTerms), where numTerms is Length if DiscountOverlaps is false, else it's Length - NumOverlap.

    This is a Lucene.NET EXPERIMENTAL API, use at your own risk
    Declaration
    public override float LengthNorm(FieldInvertState state)
    Parameters
    Type Name Description
    FieldInvertState state
    Returns
    Type Description
    System.Single
    Overrides
    TFIDFSimilarity.LengthNorm(FieldInvertState)
    | Improve this Doc View Source

    QueryNorm(Single)

    Implemented as 1/sqrt(sumOfSquaredWeights).

    Declaration
    public override float QueryNorm(float sumOfSquaredWeights)
    Parameters
    Type Name Description
    System.Single sumOfSquaredWeights
    Returns
    Type Description
    System.Single
    Overrides
    TFIDFSimilarity.QueryNorm(Single)
    | Improve this Doc View Source

    ScorePayload(Int32, Int32, Int32, BytesRef)

    The default implementation returns 1

    Declaration
    public override float ScorePayload(int doc, int start, int end, BytesRef payload)
    Parameters
    Type Name Description
    System.Int32 doc
    System.Int32 start
    System.Int32 end
    BytesRef payload
    Returns
    Type Description
    System.Single
    Overrides
    TFIDFSimilarity.ScorePayload(Int32, Int32, Int32, BytesRef)
    | Improve this Doc View Source

    SloppyFreq(Int32)

    Implemented as 1 / (distance + 1).

    Declaration
    public override float SloppyFreq(int distance)
    Parameters
    Type Name Description
    System.Int32 distance
    Returns
    Type Description
    System.Single
    Overrides
    TFIDFSimilarity.SloppyFreq(Int32)
    | Improve this Doc View Source

    Tf(Single)

    Implemented as Math.Sqrt(freq).

    Declaration
    public override float Tf(float freq)
    Parameters
    Type Name Description
    System.Single freq
    Returns
    Type Description
    System.Single
    Overrides
    TFIDFSimilarity.Tf(Single)
    | Improve this Doc View Source

    ToString()

    Declaration
    public override string ToString()
    Returns
    Type Description
    System.String
    Overrides
    System.Object.ToString()
    • Improve this Doc
    • View Source
    Back to top Copyright © 2020 Licensed to the Apache Software Foundation (ASF)