Fork me on GitHub
  • API

    Show / Hide Table of Contents

    Class SimilarityBase

    A subclass of Similarity that provides a simplified API for its descendants. Subclasses are only required to implement the Score(BasicStats, float, float) and ToString() methods. Implementing Explain(Explanation, BasicStats, int, float, float) is optional, inasmuch as SimilarityBase already provides a basic explanation of the score and the term frequency. However, implementers of a subclass are encouraged to include as much detail about the scoring method as possible.

    Note: multi-word queries such as phrase queries are scored in a different way than Lucene's default ranking algorithm: whereas it "fakes" an IDF value for the phrase as a whole (since it does not know it), this class instead scores phrases as a summation of the individual term scores.

    Note

    This API is experimental and might change in incompatible ways in the next release.

    Inheritance
    object
    Similarity
    SimilarityBase
    DFRSimilarity
    IBSimilarity
    LMSimilarity
    Inherited Members
    Similarity.Coord(int, int)
    Similarity.QueryNorm(float)
    object.Equals(object)
    object.Equals(object, object)
    object.GetHashCode()
    object.GetType()
    object.MemberwiseClone()
    object.ReferenceEquals(object, object)
    Namespace: Lucene.Net.Search.Similarities
    Assembly: Lucene.Net.dll
    Syntax
    public abstract class SimilarityBase : Similarity

    Constructors

    SimilarityBase()

    Sole constructor. (For invocation by subclass constructors, typically implicit.)

    Declaration
    protected SimilarityBase()

    Properties

    DiscountOverlaps

    Determines whether overlap tokens (Tokens with 0 position increment) are ignored when computing norm. By default this is true, meaning overlap tokens do not count when computing norms.

    Note

    This API is experimental and might change in incompatible ways in the next release.

    Declaration
    public virtual bool DiscountOverlaps { get; set; }
    Property Value
    Type Description
    bool
    See Also
    ComputeNorm(FieldInvertState)

    Methods

    ComputeNorm(FieldInvertState)

    Encodes the document length in the same way as TFIDFSimilarity.

    Declaration
    public override long ComputeNorm(FieldInvertState state)
    Parameters
    Type Name Description
    FieldInvertState state
    Returns
    Type Description
    long
    Overrides
    Similarity.ComputeNorm(FieldInvertState)

    ComputeWeight(float, CollectionStatistics, params TermStatistics[])

    Compute any collection-level weight (e.g. IDF, average document length, etc) needed for scoring a query.

    Declaration
    public override sealed Similarity.SimWeight ComputeWeight(float queryBoost, CollectionStatistics collectionStats, params TermStatistics[] termStats)
    Parameters
    Type Name Description
    float queryBoost

    the query-time boost.

    CollectionStatistics collectionStats

    collection-level statistics, such as the number of tokens in the collection.

    TermStatistics[] termStats

    term-level statistics, such as the document frequency of a term across the collection.

    Returns
    Type Description
    Similarity.SimWeight

    Similarity.SimWeight object with the information this Similarity needs to score a query.

    Overrides
    Similarity.ComputeWeight(float, CollectionStatistics, params TermStatistics[])

    DecodeNormValue(byte)

    Decodes a normalization factor (document length) stored in an index.

    Declaration
    protected virtual float DecodeNormValue(byte norm)
    Parameters
    Type Name Description
    byte norm
    Returns
    Type Description
    float

    EncodeNormValue(float, float)

    Encodes the length to a byte via SmallSingle.

    Declaration
    protected virtual byte EncodeNormValue(float boost, float length)
    Parameters
    Type Name Description
    float boost
    float length
    Returns
    Type Description
    byte

    Explain(Explanation, BasicStats, int, float, float)

    Subclasses should implement this method to explain the score. expl already contains the score, the name of the class and the doc id, as well as the term frequency and its explanation; subclasses can add additional clauses to explain details of their scoring formulae.

    The default implementation does nothing.

    Declaration
    protected virtual void Explain(Explanation expl, BasicStats stats, int doc, float freq, float docLen)
    Parameters
    Type Name Description
    Explanation expl

    the explanation to extend with details.

    BasicStats stats

    the corpus level statistics.

    int doc

    the document id.

    float freq

    the term frequency.

    float docLen

    the document length.

    Explain(BasicStats, int, Explanation, float)

    Explains the score. The implementation here provides a basic explanation in the format Score(name-of-similarity, doc=doc-id, freq=term-frequency), computed from:, and attaches the score (computed via the Score(BasicStats, float, float) method) and the explanation for the term frequency. Subclasses content with this format may add additional details in Explain(Explanation, BasicStats, int, float, float).

    Declaration
    public virtual Explanation Explain(BasicStats stats, int doc, Explanation freq, float docLen)
    Parameters
    Type Name Description
    BasicStats stats

    the corpus level statistics.

    int doc

    the document id.

    Explanation freq

    the term frequency and its explanation.

    float docLen

    the document length.

    Returns
    Type Description
    Explanation

    the explanation.

    FillBasicStats(BasicStats, CollectionStatistics, TermStatistics)

    Fills all member fields defined in BasicStats in stats. Subclasses can override this method to fill additional stats.

    Declaration
    protected virtual void FillBasicStats(BasicStats stats, CollectionStatistics collectionStats, TermStatistics termStats)
    Parameters
    Type Name Description
    BasicStats stats
    CollectionStatistics collectionStats
    TermStatistics termStats

    GetSimScorer(SimWeight, AtomicReaderContext)

    Creates a new Similarity.SimScorer to score matching documents from a segment of the inverted index.

    Declaration
    public override Similarity.SimScorer GetSimScorer(Similarity.SimWeight stats, AtomicReaderContext context)
    Parameters
    Type Name Description
    Similarity.SimWeight stats
    AtomicReaderContext context

    segment of the inverted index to be scored.

    Returns
    Type Description
    Similarity.SimScorer

    Sloppy Similarity.SimScorer for scoring documents across context

    Overrides
    Similarity.GetSimScorer(Similarity.SimWeight, AtomicReaderContext)
    Exceptions
    Type Condition
    IOException

    if there is a low-level I/O error

    Log2(double)

    Returns the base two logarithm of x.

    Declaration
    public static double Log2(double x)
    Parameters
    Type Name Description
    double x
    Returns
    Type Description
    double

    NewStats(string, float)

    Factory method to return a custom stats object

    Declaration
    protected virtual BasicStats NewStats(string field, float queryBoost)
    Parameters
    Type Name Description
    string field
    float queryBoost
    Returns
    Type Description
    BasicStats

    Score(BasicStats, float, float)

    Scores the document doc.

    Subclasses must apply their scoring formula in this class.

    Declaration
    public abstract float Score(BasicStats stats, float freq, float docLen)
    Parameters
    Type Name Description
    BasicStats stats

    the corpus level statistics.

    float freq

    the term frequency.

    float docLen

    the document length.

    Returns
    Type Description
    float

    the score.

    ToString()

    Subclasses must override this method to return the name of the Similarity and preferably the values of parameters (if any) as well.

    Declaration
    public override abstract string ToString()
    Returns
    Type Description
    string
    Overrides
    object.ToString()
    Back to top Copyright © 2024 The Apache Software Foundation, Licensed under the Apache License, Version 2.0
    Apache Lucene.Net, Lucene.Net, Apache, the Apache feather logo, and the Apache Lucene.Net project logo are trademarks of The Apache Software Foundation.
    All other marks mentioned may be trademarks or registered trademarks of their respective owners.