Class Highlighter

Class used to markup highlighted terms found in the best sections of a text, using configurable IFragmenter, Lucene.Net.Search.Scorer, IFormatter, IEncoder and tokenizers.

Inheritance

object

Highlighter

Inherited Members

object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Namespace: Lucene.Net.Search.Highlight

Assembly: Lucene.Net.Highlighter.dll

Syntax

public class Highlighter

Constructors

Highlighter(IFormatter, IEncoder, IScorer)

Class used to markup highlighted terms found in the best sections of a text, using configurable IFragmenter, Lucene.Net.Search.Scorer, IFormatter, IEncoder and tokenizers.

Declaration

public Highlighter(IFormatter formatter, IEncoder encoder, IScorer fragmentScorer)

Parameters

Type	Name	Description
IFormatter	formatter
IEncoder	encoder
IScorer	fragmentScorer

Highlighter(IFormatter, IScorer)

Class used to markup highlighted terms found in the best sections of a text, using configurable IFragmenter, Lucene.Net.Search.Scorer, IFormatter, IEncoder and tokenizers.

Declaration

public Highlighter(IFormatter formatter, IScorer fragmentScorer)

Parameters

Type	Name	Description
IFormatter	formatter
IScorer	fragmentScorer

Highlighter(IScorer)

Class used to markup highlighted terms found in the best sections of a text, using configurable IFragmenter, Lucene.Net.Search.Scorer, IFormatter, IEncoder and tokenizers.

Declaration

public Highlighter(IScorer fragmentScorer)

Parameters

Type	Name	Description
IScorer	fragmentScorer

Fields

DEFAULT_MAX_CHARS_TO_ANALYZE

Class used to markup highlighted terms found in the best sections of a text, using configurable IFragmenter, Lucene.Net.Search.Scorer, IFormatter, IEncoder and tokenizers.

Declaration

public static readonly int DEFAULT_MAX_CHARS_TO_ANALYZE

Field Value

Type	Description
int

Properties

Encoder

Class used to markup highlighted terms found in the best sections of a text, using configurable IFragmenter, Lucene.Net.Search.Scorer, IFormatter, IEncoder and tokenizers.

Declaration

public virtual IEncoder Encoder { get; set; }

Property Value

Type	Description
IEncoder

FragmentScorer

Class used to markup highlighted terms found in the best sections of a text, using configurable IFragmenter, Lucene.Net.Search.Scorer, IFormatter, IEncoder and tokenizers.

Declaration

public virtual IScorer FragmentScorer { get; set; }

Property Value

Type	Description
IScorer

MaxDocCharsToAnalyze

Class used to markup highlighted terms found in the best sections of a text, using configurable IFragmenter, Lucene.Net.Search.Scorer, IFormatter, IEncoder and tokenizers.

Declaration

public virtual int MaxDocCharsToAnalyze { get; set; }

Property Value

Type	Description
int

TextFragmenter

Class used to markup highlighted terms found in the best sections of a text, using configurable IFragmenter, Lucene.Net.Search.Scorer, IFormatter, IEncoder and tokenizers.

Declaration

public virtual IFragmenter TextFragmenter { get; set; }

Property Value

Type	Description
IFragmenter

Methods

GetBestFragment(Analyzer, string, string)

Highlights chosen terms in a text, extracting the most relevant section. This is a convenience method that calls GetBestFragment(TokenStream, string)

Declaration

public string GetBestFragment(Analyzer analyzer, string fieldName, string text)

Parameters

Type	Name	Description
Analyzer	analyzer	the analyzer that will be used to split `text` into chunks
string	fieldName	Name of field used to influence analyzer's tokenization policy
string	text	text to highlight terms in

Returns

Type	Description
string	highlighted text fragment or null if no terms found

Exceptions

Type	Condition
InvalidTokenOffsetsException	thrown if any token's EndOffset exceeds the provided text's length

GetBestFragment(TokenStream, string)

Highlights chosen terms in a text, extracting the most relevant section. The document text is analysed in chunks to record hit statistics across the document. After accumulating stats, the fragment with the highest score is returned

Declaration

public string GetBestFragment(TokenStream tokenStream, string text)

Parameters

Type	Name	Description
TokenStream	tokenStream	A stream of tokens identified in the text parameter, including offset information. This is typically produced by an analyzer re-parsing a document's text. Some work may be done on retrieving TokenStreams more efficiently by adding support for storing original text position data in the Lucene index but this support is not currently available (as of Lucene 1.4 rc2).
string	text	text to highlight terms in

Returns

Type	Description
string	highlighted text fragment or null if no terms found

Exceptions

Type	Condition
InvalidTokenOffsetsException	thrown if any token's EndOffset exceeds the provided text's length

GetBestFragments(Analyzer, string, string, int)

Highlights chosen terms in a text, extracting the most relevant sections. This is a convenience method that calls GetBestFragments(TokenStream, string, int)

Declaration

public string[] GetBestFragments(Analyzer analyzer, string fieldName, string text, int maxNumFragments)

Parameters

Type	Name	Description
Analyzer	analyzer	the analyzer that will be used to split `text` into chunks
string	fieldName	the name of the field being highlighted (used by analyzer)
string	text	text to highlight terms in
int	maxNumFragments	the maximum number of fragments.

Returns

Type	Description
string[]	highlighted text fragments (between 0 and `maxNumFragments` number of fragments)

Exceptions

Type	Condition
InvalidTokenOffsetsException	thrown if any token's EndOffset exceeds the provided text's length

GetBestFragments(TokenStream, string, int)

Highlights chosen terms in a text, extracting the most relevant sections. The document text is analysed in chunks to record hit statistics across the document. After accumulating stats, the fragments with the highest scores are returned as an array of strings in order of score (contiguous fragments are merged into one in their original order to improve readability)

Declaration

public string[] GetBestFragments(TokenStream tokenStream, string text, int maxNumFragments)

Parameters

Type	Name	Description
TokenStream	tokenStream
string	text	text to highlight terms in
int	maxNumFragments	the maximum number of fragments.

Returns

Type	Description
string[]	highlighted text fragments (between 0 and `maxNumFragments` number of fragments)

Exceptions

Type	Condition
InvalidTokenOffsetsException	thrown if any token's EndOffset exceeds the provided text's length

GetBestFragments(TokenStream, string, int, string)

Highlights terms in the text, extracting the most relevant sections and concatenating the chosen fragments with a separator (typically "..."). The document text is analysed in chunks to record hit statistics across the document. After accumulating stats, the fragments with the highest scores are returned in order as "separator" delimited strings.

Declaration

public virtual string GetBestFragments(TokenStream tokenStream, string text, int maxNumFragments, string separator)

Parameters

Type	Name	Description
TokenStream	tokenStream
string	text	text to highlight terms in
int	maxNumFragments	the maximum number of fragments.
string	separator	the separator used to intersperse the document fragments (typically "...")

Returns

Type	Description
string	highlighted text

Exceptions

Type	Condition
InvalidTokenOffsetsException	thrown if any token's EndOffset exceeds the provided text's length

GetBestTextFragments(TokenStream, string, bool, int)

Low level api to get the most relevant (formatted) sections of the document. This method has been made public to allow visibility of score information held in TextFragment objects. Thanks to Jason Calabrese for help in redefining the interface.

Declaration

public TextFragment[] GetBestTextFragments(TokenStream tokenStream, string text, bool mergeContiguousFragments, int maxNumFragments)

Parameters

Type	Name	Description
TokenStream	tokenStream
string	text
bool	mergeContiguousFragments
int	maxNumFragments

Returns

Type	Description
TextFragment[]

Exceptions

Type	Condition
IOException	If there is a low-level I/O error
InvalidTokenOffsetsException	thrown if any token's EndOffset exceeds the provided text's length