Class Highlighter

Class used to markup highlighted terms found in the best sections of a text, using configurable IFragmenter, Lucene.Net.Search.Scorer, IFormatter, IEncoder and tokenizers.

Inheritance

System.Object

Highlighter

Inherited Members

System.Object.Equals(System.Object)

System.Object.Equals(System.Object, System.Object)

System.Object.GetHashCode()

System.Object.GetType()

System.Object.MemberwiseClone()

System.Object.ReferenceEquals(System.Object, System.Object)

System.Object.ToString()

Namespace: Lucene.Net.Search.Highlight

Assembly: Lucene.Net.Highlighter.dll

Syntax

public class Highlighter

Constructors

| Improve this Doc View Source

Highlighter(IFormatter, IEncoder, IScorer)

Declaration

public Highlighter(IFormatter formatter, IEncoder encoder, IScorer fragmentScorer)

Parameters

Type	Name	Description
IFormatter	formatter
IEncoder	encoder
IScorer	fragmentScorer

| Improve this Doc View Source

Highlighter(IFormatter, IScorer)

Declaration

public Highlighter(IFormatter formatter, IScorer fragmentScorer)

Parameters

Type	Name	Description
IFormatter	formatter
IScorer	fragmentScorer

| Improve this Doc View Source

Highlighter(IScorer)

Declaration

public Highlighter(IScorer fragmentScorer)

Parameters

Type	Name	Description
IScorer	fragmentScorer

Fields

| Improve this Doc View Source

DEFAULT_MAX_CHARS_TO_ANALYZE

Declaration

public static readonly int DEFAULT_MAX_CHARS_TO_ANALYZE

Field Value

Type	Description
System.Int32

Properties

| Improve this Doc View Source

Encoder

Declaration

public virtual IEncoder Encoder { get; set; }

Property Value

Type	Description
IEncoder

| Improve this Doc View Source

FragmentScorer

Declaration

public virtual IScorer FragmentScorer { get; set; }

Property Value

Type	Description
IScorer

| Improve this Doc View Source

MaxDocCharsToAnalyze

Declaration

public virtual int MaxDocCharsToAnalyze { get; set; }

Property Value

Type	Description
System.Int32

| Improve this Doc View Source

TextFragmenter

Declaration

public virtual IFragmenter TextFragmenter { get; set; }

Property Value

Type	Description
IFragmenter

Methods

| Improve this Doc View Source

GetBestFragment(Analyzer, String, String)

Highlights chosen terms in a text, extracting the most relevant section. This is a convenience method that calls GetBestFragment(TokenStream, String)

Declaration

public string GetBestFragment(Analyzer analyzer, string fieldName, string text)

Parameters

Type	Name	Description
Lucene.Net.Analysis.Analyzer	analyzer	the analyzer that will be used to split `text` into chunks
System.String	fieldName	Name of field used to influence analyzer's tokenization policy
System.String	text	text to highlight terms in

Returns

Type	Description
System.String	highlighted text fragment or null if no terms found

Exceptions

Type	Condition
InvalidTokenOffsetsException	thrown if any token's EndOffset exceeds the provided text's length

| Improve this Doc View Source

GetBestFragment(TokenStream, String)

Highlights chosen terms in a text, extracting the most relevant section. The document text is analysed in chunks to record hit statistics across the document. After accumulating stats, the fragment with the highest score is returned

Declaration

public string GetBestFragment(TokenStream tokenStream, string text)

Parameters

Type	Name	Description
Lucene.Net.Analysis.TokenStream	tokenStream	A stream of tokens identified in the text parameter, including offset information. This is typically produced by an analyzer re-parsing a document's text. Some work may be done on retrieving TokenStreams more efficiently by adding support for storing original text position data in the Lucene index but this support is not currently available (as of Lucene 1.4 rc2).
System.String	text	text to highlight terms in

Returns

Type	Description
System.String	highlighted text fragment or null if no terms found

Exceptions

Type	Condition
InvalidTokenOffsetsException	thrown if any token's EndOffset exceeds the provided text's length

| Improve this Doc View Source

GetBestFragments(Analyzer, String, String, Int32)

Highlights chosen terms in a text, extracting the most relevant sections. This is a convenience method that calls GetBestFragments(TokenStream, String, Int32)

Declaration

public string[] GetBestFragments(Analyzer analyzer, string fieldName, string text, int maxNumFragments)

Parameters

Type	Name	Description
Lucene.Net.Analysis.Analyzer	analyzer	the analyzer that will be used to split `text` into chunks
System.String	fieldName	the name of the field being highlighted (used by analyzer)
System.String	text	text to highlight terms in
System.Int32	maxNumFragments	the maximum number of fragments.

Returns

Type	Description
System.String[]	highlighted text fragments (between 0 and `maxNumFragments` number of fragments)

Exceptions

Type	Condition
InvalidTokenOffsetsException	thrown if any token's EndOffset exceeds the provided text's length

| Improve this Doc View Source

GetBestFragments(TokenStream, String, Int32)

Highlights chosen terms in a text, extracting the most relevant sections. The document text is analysed in chunks to record hit statistics across the document. After accumulating stats, the fragments with the highest scores are returned as an array of strings in order of score (contiguous fragments are merged into one in their original order to improve readability)

Declaration

public string[] GetBestFragments(TokenStream tokenStream, string text, int maxNumFragments)

Parameters

Type	Name	Description
Lucene.Net.Analysis.TokenStream	tokenStream
System.String	text	text to highlight terms in
System.Int32	maxNumFragments	the maximum number of fragments.

Returns

Type	Description
System.String[]	highlighted text fragments (between 0 and `maxNumFragments` number of fragments)

Exceptions

Type	Condition
InvalidTokenOffsetsException	thrown if any token's EndOffset exceeds the provided text's length

| Improve this Doc View Source

GetBestFragments(TokenStream, String, Int32, String)

Highlights terms in the text, extracting the most relevant sections and concatenating the chosen fragments with a separator (typically "..."). The document text is analysed in chunks to record hit statistics across the document. After accumulating stats, the fragments with the highest scores are returned in order as "separator" delimited strings.

Declaration

public virtual string GetBestFragments(TokenStream tokenStream, string text, int maxNumFragments, string separator)

Parameters

Type	Name	Description
Lucene.Net.Analysis.TokenStream	tokenStream
System.String	text	text to highlight terms in
System.Int32	maxNumFragments	the maximum number of fragments.
System.String	separator	the separator used to intersperse the document fragments (typically "...")

Returns

Type	Description
System.String	highlighted text

Exceptions

Type	Condition
InvalidTokenOffsetsException	thrown if any token's EndOffset exceeds the provided text's length

| Improve this Doc View Source

GetBestTextFragments(TokenStream, String, Boolean, Int32)

Low level api to get the most relevant (formatted) sections of the document. This method has been made public to allow visibility of score information held in TextFragment objects. Thanks to Jason Calabrese for help in redefining the interface.

Declaration

public TextFragment[] GetBestTextFragments(TokenStream tokenStream, string text, bool mergeContiguousFragments, int maxNumFragments)

Parameters

Type	Name	Description
Lucene.Net.Analysis.TokenStream	tokenStream
System.String	text
System.Boolean	mergeContiguousFragments
System.Int32	maxNumFragments

Returns

Type	Description
TextFragment[]

Exceptions

Type	Condition
System.IO.IOException	If there is a low-level I/O error
InvalidTokenOffsetsException	thrown if any token's EndOffset exceeds the provided text's length