Class SimilarityBase
A subclass of Similarity that provides a simplified API for its descendants. Subclasses are only required to implement the Score(BasicStats, float, float) and ToString() methods. Implementing Explain(Explanation, BasicStats, int, float, float) is optional, inasmuch as SimilarityBase already provides a basic explanation of the score and the term frequency. However, implementers of a subclass are encouraged to include as much detail about the scoring method as possible.
Note: multi-word queries such as phrase queries are scored in a different way than Lucene's default ranking algorithm: whereas it "fakes" an IDF value for the phrase as a whole (since it does not know it), this class instead scores phrases as a summation of the individual term scores.Note
This API is experimental and might change in incompatible ways in the next release.
Inherited Members
Namespace: Lucene.Net.Search.Similarities
Assembly: Lucene.Net.dll
Syntax
public abstract class SimilarityBase : Similarity
Constructors
SimilarityBase()
Sole constructor. (For invocation by subclass constructors, typically implicit.)
Declaration
protected SimilarityBase()
Properties
DiscountOverlaps
Determines whether overlap tokens (Tokens with
0 position increment) are ignored when computing
norm. By default this is true
, meaning overlap
tokens do not count when computing norms.
Note
This API is experimental and might change in incompatible ways in the next release.
Declaration
public virtual bool DiscountOverlaps { get; set; }
Property Value
Type | Description |
---|---|
bool |
See Also
Methods
ComputeNorm(FieldInvertState)
Encodes the document length in the same way as TFIDFSimilarity.
Declaration
public override long ComputeNorm(FieldInvertState state)
Parameters
Type | Name | Description |
---|---|---|
FieldInvertState | state |
Returns
Type | Description |
---|---|
long |
Overrides
ComputeWeight(float, CollectionStatistics, params TermStatistics[])
Compute any collection-level weight (e.g. IDF, average document length, etc) needed for scoring a query.
Declaration
public override sealed Similarity.SimWeight ComputeWeight(float queryBoost, CollectionStatistics collectionStats, params TermStatistics[] termStats)
Parameters
Type | Name | Description |
---|---|---|
float | queryBoost | the query-time boost. |
CollectionStatistics | collectionStats | collection-level statistics, such as the number of tokens in the collection. |
TermStatistics[] | termStats | term-level statistics, such as the document frequency of a term across the collection. |
Returns
Type | Description |
---|---|
Similarity.SimWeight | Similarity.SimWeight object with the information this Similarity needs to score a query. |
Overrides
DecodeNormValue(byte)
Decodes a normalization factor (document length) stored in an index.
Declaration
protected virtual float DecodeNormValue(byte norm)
Parameters
Type | Name | Description |
---|---|---|
byte | norm |
Returns
Type | Description |
---|---|
float |
EncodeNormValue(float, float)
Encodes the length to a byte via SmallSingle.
Declaration
protected virtual byte EncodeNormValue(float boost, float length)
Parameters
Type | Name | Description |
---|---|---|
float | boost | |
float | length |
Returns
Type | Description |
---|---|
byte |
Explain(Explanation, BasicStats, int, float, float)
Subclasses should implement this method to explain the score. expl
already contains the score, the name of the class and the doc id, as well
as the term frequency and its explanation; subclasses can add additional
clauses to explain details of their scoring formulae.
The default implementation does nothing.
Declaration
protected virtual void Explain(Explanation expl, BasicStats stats, int doc, float freq, float docLen)
Parameters
Type | Name | Description |
---|---|---|
Explanation | expl | the explanation to extend with details. |
BasicStats | stats | the corpus level statistics. |
int | doc | the document id. |
float | freq | the term frequency. |
float | docLen | the document length. |
Explain(BasicStats, int, Explanation, float)
Explains the score. The implementation here provides a basic explanation in the format Score(name-of-similarity, doc=doc-id, freq=term-frequency), computed from:, and attaches the score (computed via the Score(BasicStats, float, float) method) and the explanation for the term frequency. Subclasses content with this format may add additional details in Explain(Explanation, BasicStats, int, float, float).
Declaration
public virtual Explanation Explain(BasicStats stats, int doc, Explanation freq, float docLen)
Parameters
Type | Name | Description |
---|---|---|
BasicStats | stats | the corpus level statistics. |
int | doc | the document id. |
Explanation | freq | the term frequency and its explanation. |
float | docLen | the document length. |
Returns
Type | Description |
---|---|
Explanation | the explanation. |
FillBasicStats(BasicStats, CollectionStatistics, TermStatistics)
Fills all member fields defined in BasicStats in stats
.
Subclasses can override this method to fill additional stats.
Declaration
protected virtual void FillBasicStats(BasicStats stats, CollectionStatistics collectionStats, TermStatistics termStats)
Parameters
Type | Name | Description |
---|---|---|
BasicStats | stats | |
CollectionStatistics | collectionStats | |
TermStatistics | termStats |
GetSimScorer(SimWeight, AtomicReaderContext)
Creates a new Similarity.SimScorer to score matching documents from a segment of the inverted index.
Declaration
public override Similarity.SimScorer GetSimScorer(Similarity.SimWeight stats, AtomicReaderContext context)
Parameters
Type | Name | Description |
---|---|---|
Similarity.SimWeight | stats | |
AtomicReaderContext | context | segment of the inverted index to be scored. |
Returns
Type | Description |
---|---|
Similarity.SimScorer | Sloppy Similarity.SimScorer for scoring documents across |
Overrides
Exceptions
Type | Condition |
---|---|
IOException | if there is a low-level I/O error |
Log2(double)
Returns the base two logarithm of x
.
Declaration
public static double Log2(double x)
Parameters
Type | Name | Description |
---|---|---|
double | x |
Returns
Type | Description |
---|---|
double |
NewStats(string, float)
Factory method to return a custom stats object
Declaration
protected virtual BasicStats NewStats(string field, float queryBoost)
Parameters
Type | Name | Description |
---|---|---|
string | field | |
float | queryBoost |
Returns
Type | Description |
---|---|
BasicStats |
Score(BasicStats, float, float)
Scores the document doc
.
Subclasses must apply their scoring formula in this class.
Declaration
public abstract float Score(BasicStats stats, float freq, float docLen)
Parameters
Type | Name | Description |
---|---|---|
BasicStats | stats | the corpus level statistics. |
float | freq | the term frequency. |
float | docLen | the document length. |
Returns
Type | Description |
---|---|
float | the score. |
ToString()
Subclasses must override this method to return the name of the Similarity and preferably the values of parameters (if any) as well.
Declaration
public override abstract string ToString()
Returns
Type | Description |
---|---|
string |