Class FuzzyQuery
Implements the fuzzy search query. The similarity measurement
is based on the Damerau-Levenshtein (optimal string alignment) algorithm,
though you can explicitly choose classic Levenshtein by passing false
to the transpositions
parameter.
this query uses
MultiTermQuery.TopTermsScoringBooleanQueryRewrite
as default. So terms will be collected and scored according to their
edit distance. Only the top terms are used for building the
BooleanQuery.
It is not recommended to change the rewrite mode for fuzzy queries.
At most, this query will match terms up to
MAXIMUM_SUPPORTED_DISTANCE edits.
Higher distances (especially with transpositions enabled), are generally not useful and
will match a significant amount of the term dictionary. If you really want this, consider
using an n-gram indexing technique (such as the SpellChecker in the
suggest module) instead.
NOTE: terms of length 1 or 2 will sometimes not match because of how the scaled
distance between two terms is computed. For a term to match, the edit distance between
the terms must be less than the minimum length term (either the input term, or
the candidate term). For example,
FuzzyQuery on term "abcd" with maxEdits=2 will
not match an indexed term "ab", and
FuzzyQuery on term "a" with maxEdits=2 will not
match an indexed term "abc".
Assembly: Lucene.Net.dll
Syntax
public class FuzzyQuery : MultiTermQuery
Constructors
FuzzyQuery(Term)
Declaration
public FuzzyQuery(Term term)
Parameters
Type |
Name |
Description |
Term |
term |
|
FuzzyQuery(Term, int)
Declaration
public FuzzyQuery(Term term, int maxEdits)
Parameters
Type |
Name |
Description |
Term |
term |
|
int |
maxEdits |
|
FuzzyQuery(Term, int, int)
Declaration
public FuzzyQuery(Term term, int maxEdits, int prefixLength)
Parameters
Type |
Name |
Description |
Term |
term |
|
int |
maxEdits |
|
int |
prefixLength |
|
FuzzyQuery(Term, int, int, int, bool)
Create a new FuzzyQuery that will match terms with an edit distance
of at most maxEdits
to term
.
If a prefixLength
> 0 is specified, a common prefix
of that length is also required.
Declaration
public FuzzyQuery(Term term, int maxEdits, int prefixLength, int maxExpansions, bool transpositions)
Parameters
Type |
Name |
Description |
Term |
term |
The term to search for
|
int |
maxEdits |
Must be >= 0 and <= MAXIMUM_SUPPORTED_DISTANCE.
|
int |
prefixLength |
Length of common (non-fuzzy) prefix
|
int |
maxExpansions |
The maximum number of terms to match. If this number is
greater than MaxClauseCount when the query is rewritten,
then the maxClauseCount will be used instead.
|
bool |
transpositions |
true if transpositions should be treated as a primitive
edit operation. If this is false , comparisons will implement the classic
Levenshtein algorithm.
|
Fields
DefaultMaxEdits
Implements the fuzzy search query. The similarity measurement
is based on the Damerau-Levenshtein (optimal string alignment) algorithm,
though you can explicitly choose classic Levenshtein by passing false
to the transpositions
parameter.
this query uses
MultiTermQuery.TopTermsScoringBooleanQueryRewrite
as default. So terms will be collected and scored according to their
edit distance. Only the top terms are used for building the
BooleanQuery.
It is not recommended to change the rewrite mode for fuzzy queries.
At most, this query will match terms up to
MAXIMUM_SUPPORTED_DISTANCE edits.
Higher distances (especially with transpositions enabled), are generally not useful and
will match a significant amount of the term dictionary. If you really want this, consider
using an n-gram indexing technique (such as the SpellChecker in the
suggest module) instead.
NOTE: terms of length 1 or 2 will sometimes not match because of how the scaled
distance between two terms is computed. For a term to match, the edit distance between
the terms must be less than the minimum length term (either the input term, or
the candidate term). For example,
FuzzyQuery on term "abcd" with maxEdits=2 will
not match an indexed term "ab", and
FuzzyQuery on term "a" with maxEdits=2 will not
match an indexed term "abc".
Declaration
public const int DefaultMaxEdits = 2
Field Value
DefaultMaxExpansions
Implements the fuzzy search query. The similarity measurement
is based on the Damerau-Levenshtein (optimal string alignment) algorithm,
though you can explicitly choose classic Levenshtein by passing false
to the transpositions
parameter.
this query uses
MultiTermQuery.TopTermsScoringBooleanQueryRewrite
as default. So terms will be collected and scored according to their
edit distance. Only the top terms are used for building the
BooleanQuery.
It is not recommended to change the rewrite mode for fuzzy queries.
At most, this query will match terms up to
MAXIMUM_SUPPORTED_DISTANCE edits.
Higher distances (especially with transpositions enabled), are generally not useful and
will match a significant amount of the term dictionary. If you really want this, consider
using an n-gram indexing technique (such as the SpellChecker in the
suggest module) instead.
NOTE: terms of length 1 or 2 will sometimes not match because of how the scaled
distance between two terms is computed. For a term to match, the edit distance between
the terms must be less than the minimum length term (either the input term, or
the candidate term). For example,
FuzzyQuery on term "abcd" with maxEdits=2 will
not match an indexed term "ab", and
FuzzyQuery on term "a" with maxEdits=2 will not
match an indexed term "abc".
Declaration
public const int DefaultMaxExpansions = 50
Field Value
DefaultMinSimilarity
Implements the fuzzy search query. The similarity measurement
is based on the Damerau-Levenshtein (optimal string alignment) algorithm,
though you can explicitly choose classic Levenshtein by passing false
to the transpositions
parameter.
this query uses
MultiTermQuery.TopTermsScoringBooleanQueryRewrite
as default. So terms will be collected and scored according to their
edit distance. Only the top terms are used for building the
BooleanQuery.
It is not recommended to change the rewrite mode for fuzzy queries.
At most, this query will match terms up to
MAXIMUM_SUPPORTED_DISTANCE edits.
Higher distances (especially with transpositions enabled), are generally not useful and
will match a significant amount of the term dictionary. If you really want this, consider
using an n-gram indexing technique (such as the SpellChecker in the
suggest module) instead.
NOTE: terms of length 1 or 2 will sometimes not match because of how the scaled
distance between two terms is computed. For a term to match, the edit distance between
the terms must be less than the minimum length term (either the input term, or
the candidate term). For example,
FuzzyQuery on term "abcd" with maxEdits=2 will
not match an indexed term "ab", and
FuzzyQuery on term "a" with maxEdits=2 will not
match an indexed term "abc".
Declaration
[Obsolete("pass integer edit distances instead.")]
public const float DefaultMinSimilarity = 2
Field Value
DefaultPrefixLength
Implements the fuzzy search query. The similarity measurement
is based on the Damerau-Levenshtein (optimal string alignment) algorithm,
though you can explicitly choose classic Levenshtein by passing false
to the transpositions
parameter.
this query uses
MultiTermQuery.TopTermsScoringBooleanQueryRewrite
as default. So terms will be collected and scored according to their
edit distance. Only the top terms are used for building the
BooleanQuery.
It is not recommended to change the rewrite mode for fuzzy queries.
At most, this query will match terms up to
MAXIMUM_SUPPORTED_DISTANCE edits.
Higher distances (especially with transpositions enabled), are generally not useful and
will match a significant amount of the term dictionary. If you really want this, consider
using an n-gram indexing technique (such as the SpellChecker in the
suggest module) instead.
NOTE: terms of length 1 or 2 will sometimes not match because of how the scaled
distance between two terms is computed. For a term to match, the edit distance between
the terms must be less than the minimum length term (either the input term, or
the candidate term). For example,
FuzzyQuery on term "abcd" with maxEdits=2 will
not match an indexed term "ab", and
FuzzyQuery on term "a" with maxEdits=2 will not
match an indexed term "abc".
Declaration
public const int DefaultPrefixLength = 0
Field Value
DefaultTranspositions
Implements the fuzzy search query. The similarity measurement
is based on the Damerau-Levenshtein (optimal string alignment) algorithm,
though you can explicitly choose classic Levenshtein by passing false
to the transpositions
parameter.
this query uses
MultiTermQuery.TopTermsScoringBooleanQueryRewrite
as default. So terms will be collected and scored according to their
edit distance. Only the top terms are used for building the
BooleanQuery.
It is not recommended to change the rewrite mode for fuzzy queries.
At most, this query will match terms up to
MAXIMUM_SUPPORTED_DISTANCE edits.
Higher distances (especially with transpositions enabled), are generally not useful and
will match a significant amount of the term dictionary. If you really want this, consider
using an n-gram indexing technique (such as the SpellChecker in the
suggest module) instead.
NOTE: terms of length 1 or 2 will sometimes not match because of how the scaled
distance between two terms is computed. For a term to match, the edit distance between
the terms must be less than the minimum length term (either the input term, or
the candidate term). For example,
FuzzyQuery on term "abcd" with maxEdits=2 will
not match an indexed term "ab", and
FuzzyQuery on term "a" with maxEdits=2 will not
match an indexed term "abc".
Declaration
public const bool DefaultTranspositions = true
Field Value
Properties
MaxEdits
Implements the fuzzy search query. The similarity measurement
is based on the Damerau-Levenshtein (optimal string alignment) algorithm,
though you can explicitly choose classic Levenshtein by passing false
to the transpositions
parameter.
this query uses
MultiTermQuery.TopTermsScoringBooleanQueryRewrite
as default. So terms will be collected and scored according to their
edit distance. Only the top terms are used for building the
BooleanQuery.
It is not recommended to change the rewrite mode for fuzzy queries.
At most, this query will match terms up to
MAXIMUM_SUPPORTED_DISTANCE edits.
Higher distances (especially with transpositions enabled), are generally not useful and
will match a significant amount of the term dictionary. If you really want this, consider
using an n-gram indexing technique (such as the SpellChecker in the
suggest module) instead.
NOTE: terms of length 1 or 2 will sometimes not match because of how the scaled
distance between two terms is computed. For a term to match, the edit distance between
the terms must be less than the minimum length term (either the input term, or
the candidate term). For example,
FuzzyQuery on term "abcd" with maxEdits=2 will
not match an indexed term "ab", and
FuzzyQuery on term "a" with maxEdits=2 will not
match an indexed term "abc".
Declaration
public virtual int MaxEdits { get; }
Property Value
Type |
Description |
int |
The maximum number of edit distances allowed for this query to match.
|
PrefixLength
Returns the non-fuzzy prefix length. This is the number of characters at the start
of a term that must be identical (not fuzzy) to the query term if the query
is to match that term.
Declaration
public virtual int PrefixLength { get; }
Property Value
Term
Returns the pattern term.
Declaration
public virtual Term Term { get; }
Property Value
Transpositions
Returns true
if transpositions should be treated as a primitive edit operation.
If this is false
, comparisons will implement the classic Levenshtein algorithm.
Declaration
public virtual bool Transpositions { get; }
Property Value
Methods
Equals(object)
Determines whether the specified object is equal to the current object.
Declaration
public override bool Equals(object obj)
Parameters
Type |
Name |
Description |
object |
obj |
The object to compare with the current object.
|
Returns
Type |
Description |
bool |
true if the specified object is equal to the current object; otherwise, false.
|
Overrides
GetHashCode()
Serves as the default hash function.
Declaration
public override int GetHashCode()
Returns
Type |
Description |
int |
A hash code for the current object.
|
Overrides
GetTermsEnum(Terms, AttributeSource)
Construct the enumeration to be used, expanding the
pattern term. this method should only be called if
the field exists (ie, implementations can assume the
field does exist). this method should not return null
(should instead return EMPTY if no
terms match). The TermsEnum must already be
positioned to the first matching term.
The given AttributeSource is passed by the MultiTermQuery.RewriteMethod to
provide attributes, the rewrite method uses to inform about e.g. maximum competitive boosts.
this is currently only used by TopTermsRewrite<Q>.
Declaration
protected override TermsEnum GetTermsEnum(Terms terms, AttributeSource atts)
Parameters
Returns
Overrides
SingleToEdits(float, int)
Helper function to convert from deprecated "minimumSimilarity" fractions
to raw edit distances.
NOTE: this was floatToEdits() in Lucene
Declaration
[Obsolete("pass integer edit distances instead.")]
public static int SingleToEdits(float minimumSimilarity, int termLen)
Parameters
Type |
Name |
Description |
float |
minimumSimilarity |
Scaled similarity
|
int |
termLen |
Length (in unicode codepoints) of the term.
|
Returns
Type |
Description |
int |
Equivalent number of maxEdits
|
ToString(string)
Prints a query to a string, with field
assumed to be the
default field and omitted.
Declaration
public override string ToString(string field)
Parameters
Type |
Name |
Description |
string |
field |
|
Returns
Overrides