Lucene.Net  3.0.3
Lucene.Net is a port of the Lucene search engine library, written in C# and targeted at .NET runtime users.
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Properties Pages
Static Public Member Functions | List of all members
Lucene.Net.Search.Similar.SimilarityQueries Class Reference

Simple similarity measures. More...

Static Public Member Functions

static Query FormSimilarQuery (System.String body, Analyzer a, System.String field, ISet< string > stop)
 Simple similarity query generators. Takes every unique word and forms a boolean query where all words are optional. After you get this you'll use to to query your IndexSearcher for similar docs. The only caveat is the first hit returned should be your source document - you'll need to then ignore that.
 

Detailed Description

Simple similarity measures.

See Also
Lucene.Net.Search.Similar.MoreLikeThis

Definition at line 33 of file SimilarityQueries.cs.

Member Function Documentation

static Query Lucene.Net.Search.Similar.SimilarityQueries.FormSimilarQuery ( System.String  body,
Analyzer  a,
System.String  field,
ISet< string >  stop 
)
static

Simple similarity query generators. Takes every unique word and forms a boolean query where all words are optional. After you get this you'll use to to query your IndexSearcher for similar docs. The only caveat is the first hit returned should be your source document - you'll need to then ignore that.

So, if you have a code fragment like this:
Query q = formSimilaryQuery( "I use Lucene to search fast. Fast searchers are good", new StandardAnalyzer(), "contents", null);

The query returned, in string form, will be '(i use lucene to search fast searchers are good').

The philosophy behind this method is "two documents are similar if they share lots of words". Note that behind the scenes, Lucenes scoring algorithm will tend to give two documents a higher similarity score if the share more uncommon words.

This method is fail-safe in that if a long 'body' is passed in and BooleanQuery.Add (used internally) throws BooleanQuery.TooManyClauses, the query as it is will be returned.

Parameters
bodythe body of the document you want to find similar documents to
athe analyzer to use to parse the body
fieldthe field you want to search on, probably something like "contents" or "body"
stopoptional set of stop words to ignore
Returns
a query with all unique words in 'body'

<throws> IOException this can't happen... </throws>

Definition at line 80 of file SimilarityQueries.cs.


The documentation for this class was generated from the following file: