Class ICUPostingsHighlighter

Simple highlighter that does not analyze fields nor use term vectors. Instead it requires Lucene.Net.Index.IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS.

PostingsHighlighter treats the single original document as the whole corpus, and then scores individual passages as if they were documents in this corpus. It uses a ICU4N.Text.BreakIterator to find passages in the text; by default it breaks using GetSentenceInstance(CultureInfo) (for sentence breaking). It then iterates in parallel (merge sorting by offset) through the positions of all terms from the query, coalescing those hits that occur in a single passage into a Passage, and then scores each Passage using a separate PassageScorer. Passages are finally formatted into highlighted snippets with a PassageFormatter.

You can customize the behavior by subclassing this highlighter, some important hooks:

GetBreakIterator(string): Customize how the text is divided into passages.
GetScorer(string): Customize how passages are ranked.
GetFormatter(string): Customize how snippets are formatted.
GetIndexAnalyzer(string): Enable highlighting of MultiTermQuerys such as Lucene.Net.Search.WildcardQuery.

WARNING: The code is very new and probably still has some exciting bugs!

Example usage:

// configure field with offsets at index time
IndexableFieldType offsetsType = new IndexableFieldType(TextField.TYPE_STORED);
offsetsType.IndexOptions = IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS;
Field body = new Field("body", "foobar", offsetsType);
// retrieve highlights at query time
ICUPostingsHighlighter highlighter = new ICUPostingsHighlighter();
Query query = new TermQuery(new Term("body", "highlighting"));
TopDocs topDocs = searcher.Search(query, n);
string highlights[] = highlighter.Highlight("body", query, searcher, topDocs);

This is thread-safe, and can be used across different readers.

Note that the .NET implementation differs from the PostingsHighlighter in Lucene in that it is backed by an ICU ICU4N.Text.RuleBasedBreakIterator, which differs slightly in default behavior than the one in the JDK. However, the ICU ICU4N.Text.RuleBasedBreakIterator behavior can be customized to meet a lot of scenarios that the one in the JDK cannot. See the ICU documentation at http://userguide.icu-project.org/boundaryanalysis/break-rules for more information how to pass custom rules to an ICU ICU4N.Text.RuleBasedBreakIterator.

Note

This API is experimental and might change in incompatible ways in the next release.

Creates a new highlighter with DEFAULT_MAX_LENGTH.

PostingsHighlighter treats the single original document as the whole corpus, and then scores individual passages as if they were documents in this corpus. It uses a ICU4N.Text.BreakIterator to find passages in the text; by default it breaks using GetSentenceInstance(CultureInfo) (for sentence breaking). It then iterates in parallel (merge sorting by offset) through the positions of all terms from the query, coalescing those hits that occur in a single passage into a Passage, and then scores each Passage using a separate PassageScorer. Passages are finally formatted into highlighted snippets with a PassageFormatter.

You can customize the behavior by subclassing this highlighter, some important hooks:

GetBreakIterator(string): Customize how the text is divided into passages.
GetScorer(string): Customize how passages are ranked.
GetFormatter(string): Customize how snippets are formatted.
GetIndexAnalyzer(string): Enable highlighting of MultiTermQuerys such as Lucene.Net.Search.WildcardQuery.

WARNING: The code is very new and probably still has some exciting bugs!

Example usage:

// configure field with offsets at index time
IndexableFieldType offsetsType = new IndexableFieldType(TextField.TYPE_STORED);
offsetsType.IndexOptions = IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS;
Field body = new Field("body", "foobar", offsetsType);
// retrieve highlights at query time
ICUPostingsHighlighter highlighter = new ICUPostingsHighlighter();
Query query = new TermQuery(new Term("body", "highlighting"));
TopDocs topDocs = searcher.Search(query, n);
string highlights[] = highlighter.Highlight("body", query, searcher, topDocs);

This is thread-safe, and can be used across different readers.

Note that the .NET implementation differs from the PostingsHighlighter in Lucene in that it is backed by an ICU ICU4N.Text.RuleBasedBreakIterator, which differs slightly in default behavior than the one in the JDK. However, the ICU ICU4N.Text.RuleBasedBreakIterator behavior can be customized to meet a lot of scenarios that the one in the JDK cannot. See the ICU documentation at http://userguide.icu-project.org/boundaryanalysis/break-rules for more information how to pass custom rules to an ICU ICU4N.Text.RuleBasedBreakIterator.

Note

This API is experimental and might change in incompatible ways in the next release.

Creates a new highlighter, specifying maximum content length.

Type	Name	Description
int	maxLength	maximum content size to process.

PostingsHighlighter treats the single original document as the whole corpus, and then scores individual passages as if they were documents in this corpus. It uses a ICU4N.Text.BreakIterator to find passages in the text; by default it breaks using GetSentenceInstance(CultureInfo) (for sentence breaking). It then iterates in parallel (merge sorting by offset) through the positions of all terms from the query, coalescing those hits that occur in a single passage into a Passage, and then scores each Passage using a separate PassageScorer. Passages are finally formatted into highlighted snippets with a PassageFormatter.

You can customize the behavior by subclassing this highlighter, some important hooks:

GetBreakIterator(string): Customize how the text is divided into passages.
GetScorer(string): Customize how passages are ranked.
GetFormatter(string): Customize how snippets are formatted.
GetIndexAnalyzer(string): Enable highlighting of MultiTermQuerys such as Lucene.Net.Search.WildcardQuery.

WARNING: The code is very new and probably still has some exciting bugs!

Example usage:

// configure field with offsets at index time
IndexableFieldType offsetsType = new IndexableFieldType(TextField.TYPE_STORED);
offsetsType.IndexOptions = IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS;
Field body = new Field("body", "foobar", offsetsType);
// retrieve highlights at query time
ICUPostingsHighlighter highlighter = new ICUPostingsHighlighter();
Query query = new TermQuery(new Term("body", "highlighting"));
TopDocs topDocs = searcher.Search(query, n);
string highlights[] = highlighter.Highlight("body", query, searcher, topDocs);

This is thread-safe, and can be used across different readers.

Note that the .NET implementation differs from the PostingsHighlighter in Lucene in that it is backed by an ICU ICU4N.Text.RuleBasedBreakIterator, which differs slightly in default behavior than the one in the JDK. However, the ICU ICU4N.Text.RuleBasedBreakIterator behavior can be customized to meet a lot of scenarios that the one in the JDK cannot. See the ICU documentation at http://userguide.icu-project.org/boundaryanalysis/break-rules for more information how to pass custom rules to an ICU ICU4N.Text.RuleBasedBreakIterator.

Note

This API is experimental and might change in incompatible ways in the next release.

Type	Condition
ArgumentException	if `maxLength` is negative or `int.MaxValue`

Default maximum content size to process. Typically snippets closer to the beginning of the document better summarize its content

Type	Description
int

PostingsHighlighter treats the single original document as the whole corpus, and then scores individual passages as if they were documents in this corpus. It uses a ICU4N.Text.BreakIterator to find passages in the text; by default it breaks using GetSentenceInstance(CultureInfo) (for sentence breaking). It then iterates in parallel (merge sorting by offset) through the positions of all terms from the query, coalescing those hits that occur in a single passage into a Passage, and then scores each Passage using a separate PassageScorer. Passages are finally formatted into highlighted snippets with a PassageFormatter.

You can customize the behavior by subclassing this highlighter, some important hooks:

GetBreakIterator(string): Customize how the text is divided into passages.
GetScorer(string): Customize how passages are ranked.
GetFormatter(string): Customize how snippets are formatted.
GetIndexAnalyzer(string): Enable highlighting of MultiTermQuerys such as Lucene.Net.Search.WildcardQuery.

WARNING: The code is very new and probably still has some exciting bugs!

Example usage:

// configure field with offsets at index time
IndexableFieldType offsetsType = new IndexableFieldType(TextField.TYPE_STORED);
offsetsType.IndexOptions = IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS;
Field body = new Field("body", "foobar", offsetsType);
// retrieve highlights at query time
ICUPostingsHighlighter highlighter = new ICUPostingsHighlighter();
Query query = new TermQuery(new Term("body", "highlighting"));
TopDocs topDocs = searcher.Search(query, n);
string highlights[] = highlighter.Highlight("body", query, searcher, topDocs);

This is thread-safe, and can be used across different readers.

Note that the .NET implementation differs from the PostingsHighlighter in Lucene in that it is backed by an ICU ICU4N.Text.RuleBasedBreakIterator, which differs slightly in default behavior than the one in the JDK. However, the ICU ICU4N.Text.RuleBasedBreakIterator behavior can be customized to meet a lot of scenarios that the one in the JDK cannot. See the ICU documentation at http://userguide.icu-project.org/boundaryanalysis/break-rules for more information how to pass custom rules to an ICU ICU4N.Text.RuleBasedBreakIterator.

Note

This API is experimental and might change in incompatible ways in the next release.

Returns the ICU4N.Text.BreakIterator to use for dividing text into passages. This instantiates an GetSentenceInstance(CultureInfo) by default; subclasses can override to customize.

Type	Name	Description
string	field

Type	Description
BreakIterator

PostingsHighlighter treats the single original document as the whole corpus, and then scores individual passages as if they were documents in this corpus. It uses a ICU4N.Text.BreakIterator to find passages in the text; by default it breaks using GetSentenceInstance(CultureInfo) (for sentence breaking). It then iterates in parallel (merge sorting by offset) through the positions of all terms from the query, coalescing those hits that occur in a single passage into a Passage, and then scores each Passage using a separate PassageScorer. Passages are finally formatted into highlighted snippets with a PassageFormatter.

You can customize the behavior by subclassing this highlighter, some important hooks:

GetBreakIterator(string): Customize how the text is divided into passages.
GetScorer(string): Customize how passages are ranked.
GetFormatter(string): Customize how snippets are formatted.
GetIndexAnalyzer(string): Enable highlighting of MultiTermQuerys such as Lucene.Net.Search.WildcardQuery.

WARNING: The code is very new and probably still has some exciting bugs!

Example usage:

// configure field with offsets at index time
IndexableFieldType offsetsType = new IndexableFieldType(TextField.TYPE_STORED);
offsetsType.IndexOptions = IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS;
Field body = new Field("body", "foobar", offsetsType);
// retrieve highlights at query time
ICUPostingsHighlighter highlighter = new ICUPostingsHighlighter();
Query query = new TermQuery(new Term("body", "highlighting"));
TopDocs topDocs = searcher.Search(query, n);
string highlights[] = highlighter.Highlight("body", query, searcher, topDocs);

This is thread-safe, and can be used across different readers.

Note that the .NET implementation differs from the PostingsHighlighter in Lucene in that it is backed by an ICU ICU4N.Text.RuleBasedBreakIterator, which differs slightly in default behavior than the one in the JDK. However, the ICU ICU4N.Text.RuleBasedBreakIterator behavior can be customized to meet a lot of scenarios that the one in the JDK cannot. See the ICU documentation at http://userguide.icu-project.org/boundaryanalysis/break-rules for more information how to pass custom rules to an ICU ICU4N.Text.RuleBasedBreakIterator.

Note

This API is experimental and might change in incompatible ways in the next release.

Called to summarize a document when no hits were found. By default this just returns the first maxPassages sentences; subclasses can override to customize.

Type	Name	Description
string	fieldName
BreakIterator	bi
int	maxPassages

Type	Description
Passage[]

PostingsHighlighter treats the single original document as the whole corpus, and then scores individual passages as if they were documents in this corpus. It uses a ICU4N.Text.BreakIterator to find passages in the text; by default it breaks using GetSentenceInstance(CultureInfo) (for sentence breaking). It then iterates in parallel (merge sorting by offset) through the positions of all terms from the query, coalescing those hits that occur in a single passage into a Passage, and then scores each Passage using a separate PassageScorer. Passages are finally formatted into highlighted snippets with a PassageFormatter.

You can customize the behavior by subclassing this highlighter, some important hooks:

GetBreakIterator(string): Customize how the text is divided into passages.
GetScorer(string): Customize how passages are ranked.
GetFormatter(string): Customize how snippets are formatted.
GetIndexAnalyzer(string): Enable highlighting of MultiTermQuerys such as Lucene.Net.Search.WildcardQuery.

WARNING: The code is very new and probably still has some exciting bugs!

Example usage:

// configure field with offsets at index time
IndexableFieldType offsetsType = new IndexableFieldType(TextField.TYPE_STORED);
offsetsType.IndexOptions = IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS;
Field body = new Field("body", "foobar", offsetsType);
// retrieve highlights at query time
ICUPostingsHighlighter highlighter = new ICUPostingsHighlighter();
Query query = new TermQuery(new Term("body", "highlighting"));
TopDocs topDocs = searcher.Search(query, n);
string highlights[] = highlighter.Highlight("body", query, searcher, topDocs);

This is thread-safe, and can be used across different readers.

Note that the .NET implementation differs from the PostingsHighlighter in Lucene in that it is backed by an ICU ICU4N.Text.RuleBasedBreakIterator, which differs slightly in default behavior than the one in the JDK. However, the ICU ICU4N.Text.RuleBasedBreakIterator behavior can be customized to meet a lot of scenarios that the one in the JDK cannot. See the ICU documentation at http://userguide.icu-project.org/boundaryanalysis/break-rules for more information how to pass custom rules to an ICU ICU4N.Text.RuleBasedBreakIterator.

Note

This API is experimental and might change in incompatible ways in the next release.

Returns the PassageFormatter to use for formatting passages into highlighted snippets. This returns a new PassageFormatter by default; subclasses can override to customize.

Type	Name	Description
string	field

Type	Description
PassageFormatter

PostingsHighlighter treats the single original document as the whole corpus, and then scores individual passages as if they were documents in this corpus. It uses a ICU4N.Text.BreakIterator to find passages in the text; by default it breaks using GetSentenceInstance(CultureInfo) (for sentence breaking). It then iterates in parallel (merge sorting by offset) through the positions of all terms from the query, coalescing those hits that occur in a single passage into a Passage, and then scores each Passage using a separate PassageScorer. Passages are finally formatted into highlighted snippets with a PassageFormatter.

You can customize the behavior by subclassing this highlighter, some important hooks:

GetBreakIterator(string): Customize how the text is divided into passages.
GetScorer(string): Customize how passages are ranked.
GetFormatter(string): Customize how snippets are formatted.
GetIndexAnalyzer(string): Enable highlighting of MultiTermQuerys such as Lucene.Net.Search.WildcardQuery.

WARNING: The code is very new and probably still has some exciting bugs!

Example usage:

// configure field with offsets at index time
IndexableFieldType offsetsType = new IndexableFieldType(TextField.TYPE_STORED);
offsetsType.IndexOptions = IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS;
Field body = new Field("body", "foobar", offsetsType);
// retrieve highlights at query time
ICUPostingsHighlighter highlighter = new ICUPostingsHighlighter();
Query query = new TermQuery(new Term("body", "highlighting"));
TopDocs topDocs = searcher.Search(query, n);
string highlights[] = highlighter.Highlight("body", query, searcher, topDocs);

This is thread-safe, and can be used across different readers.

Note that the .NET implementation differs from the PostingsHighlighter in Lucene in that it is backed by an ICU ICU4N.Text.RuleBasedBreakIterator, which differs slightly in default behavior than the one in the JDK. However, the ICU ICU4N.Text.RuleBasedBreakIterator behavior can be customized to meet a lot of scenarios that the one in the JDK cannot. See the ICU documentation at http://userguide.icu-project.org/boundaryanalysis/break-rules for more information how to pass custom rules to an ICU ICU4N.Text.RuleBasedBreakIterator.

Note

This API is experimental and might change in incompatible ways in the next release.

Returns the analyzer originally used to index the content for field.

This is used to highlight some Lucene.Net.Search.MultiTermQuerys.

Type	Name	Description
string	field

Type	Description
Analyzer	Lucene.Net.Analysis.Analyzer or null (the default, meaning no special multi-term processing)

PostingsHighlighter treats the single original document as the whole corpus, and then scores individual passages as if they were documents in this corpus. It uses a ICU4N.Text.BreakIterator to find passages in the text; by default it breaks using GetSentenceInstance(CultureInfo) (for sentence breaking). It then iterates in parallel (merge sorting by offset) through the positions of all terms from the query, coalescing those hits that occur in a single passage into a Passage, and then scores each Passage using a separate PassageScorer. Passages are finally formatted into highlighted snippets with a PassageFormatter.

You can customize the behavior by subclassing this highlighter, some important hooks:

GetBreakIterator(string): Customize how the text is divided into passages.
GetScorer(string): Customize how passages are ranked.
GetFormatter(string): Customize how snippets are formatted.
GetIndexAnalyzer(string): Enable highlighting of MultiTermQuerys such as Lucene.Net.Search.WildcardQuery.

WARNING: The code is very new and probably still has some exciting bugs!

Example usage:

// configure field with offsets at index time
IndexableFieldType offsetsType = new IndexableFieldType(TextField.TYPE_STORED);
offsetsType.IndexOptions = IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS;
Field body = new Field("body", "foobar", offsetsType);
// retrieve highlights at query time
ICUPostingsHighlighter highlighter = new ICUPostingsHighlighter();
Query query = new TermQuery(new Term("body", "highlighting"));
TopDocs topDocs = searcher.Search(query, n);
string highlights[] = highlighter.Highlight("body", query, searcher, topDocs);

This is thread-safe, and can be used across different readers.

Note that the .NET implementation differs from the PostingsHighlighter in Lucene in that it is backed by an ICU ICU4N.Text.RuleBasedBreakIterator, which differs slightly in default behavior than the one in the JDK. However, the ICU ICU4N.Text.RuleBasedBreakIterator behavior can be customized to meet a lot of scenarios that the one in the JDK cannot. See the ICU documentation at http://userguide.icu-project.org/boundaryanalysis/break-rules for more information how to pass custom rules to an ICU ICU4N.Text.RuleBasedBreakIterator.

Note

This API is experimental and might change in incompatible ways in the next release.

Returns the logical separator between values for multi-valued fields. The default value is a space character, which means passages can span across values, but a subclass can override, for example with U+2029 PARAGRAPH SEPARATOR (PS) if each value holds a discrete passage for highlighting.

Type	Name	Description
string	field

Type	Description
char

PostingsHighlighter treats the single original document as the whole corpus, and then scores individual passages as if they were documents in this corpus. It uses a ICU4N.Text.BreakIterator to find passages in the text; by default it breaks using GetSentenceInstance(CultureInfo) (for sentence breaking). It then iterates in parallel (merge sorting by offset) through the positions of all terms from the query, coalescing those hits that occur in a single passage into a Passage, and then scores each Passage using a separate PassageScorer. Passages are finally formatted into highlighted snippets with a PassageFormatter.

You can customize the behavior by subclassing this highlighter, some important hooks:

GetBreakIterator(string): Customize how the text is divided into passages.
GetScorer(string): Customize how passages are ranked.
GetFormatter(string): Customize how snippets are formatted.
GetIndexAnalyzer(string): Enable highlighting of MultiTermQuerys such as Lucene.Net.Search.WildcardQuery.

WARNING: The code is very new and probably still has some exciting bugs!

Example usage:

// configure field with offsets at index time
IndexableFieldType offsetsType = new IndexableFieldType(TextField.TYPE_STORED);
offsetsType.IndexOptions = IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS;
Field body = new Field("body", "foobar", offsetsType);
// retrieve highlights at query time
ICUPostingsHighlighter highlighter = new ICUPostingsHighlighter();
Query query = new TermQuery(new Term("body", "highlighting"));
TopDocs topDocs = searcher.Search(query, n);
string highlights[] = highlighter.Highlight("body", query, searcher, topDocs);

This is thread-safe, and can be used across different readers.

Note that the .NET implementation differs from the PostingsHighlighter in Lucene in that it is backed by an ICU ICU4N.Text.RuleBasedBreakIterator, which differs slightly in default behavior than the one in the JDK. However, the ICU ICU4N.Text.RuleBasedBreakIterator behavior can be customized to meet a lot of scenarios that the one in the JDK cannot. See the ICU documentation at http://userguide.icu-project.org/boundaryanalysis/break-rules for more information how to pass custom rules to an ICU ICU4N.Text.RuleBasedBreakIterator.

Note

This API is experimental and might change in incompatible ways in the next release.

Returns the PassageScorer to use for ranking passages. This returns a new PassageScorer by default; subclasses can override to customize.

Type	Name	Description
string	field

Type	Description
PassageScorer

PostingsHighlighter treats the single original document as the whole corpus, and then scores individual passages as if they were documents in this corpus. It uses a ICU4N.Text.BreakIterator to find passages in the text; by default it breaks using GetSentenceInstance(CultureInfo) (for sentence breaking). It then iterates in parallel (merge sorting by offset) through the positions of all terms from the query, coalescing those hits that occur in a single passage into a Passage, and then scores each Passage using a separate PassageScorer. Passages are finally formatted into highlighted snippets with a PassageFormatter.

You can customize the behavior by subclassing this highlighter, some important hooks:

GetBreakIterator(string): Customize how the text is divided into passages.
GetScorer(string): Customize how passages are ranked.
GetFormatter(string): Customize how snippets are formatted.
GetIndexAnalyzer(string): Enable highlighting of MultiTermQuerys such as Lucene.Net.Search.WildcardQuery.

WARNING: The code is very new and probably still has some exciting bugs!

Example usage:

// configure field with offsets at index time
IndexableFieldType offsetsType = new IndexableFieldType(TextField.TYPE_STORED);
offsetsType.IndexOptions = IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS;
Field body = new Field("body", "foobar", offsetsType);
// retrieve highlights at query time
ICUPostingsHighlighter highlighter = new ICUPostingsHighlighter();
Query query = new TermQuery(new Term("body", "highlighting"));
TopDocs topDocs = searcher.Search(query, n);
string highlights[] = highlighter.Highlight("body", query, searcher, topDocs);

This is thread-safe, and can be used across different readers.

Note that the .NET implementation differs from the PostingsHighlighter in Lucene in that it is backed by an ICU ICU4N.Text.RuleBasedBreakIterator, which differs slightly in default behavior than the one in the JDK. However, the ICU ICU4N.Text.RuleBasedBreakIterator behavior can be customized to meet a lot of scenarios that the one in the JDK cannot. See the ICU documentation at http://userguide.icu-project.org/boundaryanalysis/break-rules for more information how to pass custom rules to an ICU ICU4N.Text.RuleBasedBreakIterator.

Note

This API is experimental and might change in incompatible ways in the next release.

Highlights the top passages from a single field.

Type	Name	Description
string	field	field name to highlight. Must have a stored string value and also be indexed with offsets.
Query	query	query to highlight.
IndexSearcher	searcher	searcher that was previously used to execute the query.
TopDocs	topDocs	TopDocs containing the summary result documents to highlight.

Type	Description
string[]	Array of formatted snippets corresponding to the documents in `topDocs`. If no highlights were found for a document, the first sentence for the field will be returned.

PostingsHighlighter treats the single original document as the whole corpus, and then scores individual passages as if they were documents in this corpus. It uses a ICU4N.Text.BreakIterator to find passages in the text; by default it breaks using GetSentenceInstance(CultureInfo) (for sentence breaking). It then iterates in parallel (merge sorting by offset) through the positions of all terms from the query, coalescing those hits that occur in a single passage into a Passage, and then scores each Passage using a separate PassageScorer. Passages are finally formatted into highlighted snippets with a PassageFormatter.

You can customize the behavior by subclassing this highlighter, some important hooks:

GetBreakIterator(string): Customize how the text is divided into passages.
GetScorer(string): Customize how passages are ranked.
GetFormatter(string): Customize how snippets are formatted.
GetIndexAnalyzer(string): Enable highlighting of MultiTermQuerys such as Lucene.Net.Search.WildcardQuery.

WARNING: The code is very new and probably still has some exciting bugs!

Example usage:

// configure field with offsets at index time
IndexableFieldType offsetsType = new IndexableFieldType(TextField.TYPE_STORED);
offsetsType.IndexOptions = IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS;
Field body = new Field("body", "foobar", offsetsType);
// retrieve highlights at query time
ICUPostingsHighlighter highlighter = new ICUPostingsHighlighter();
Query query = new TermQuery(new Term("body", "highlighting"));
TopDocs topDocs = searcher.Search(query, n);
string highlights[] = highlighter.Highlight("body", query, searcher, topDocs);

This is thread-safe, and can be used across different readers.

Note that the .NET implementation differs from the PostingsHighlighter in Lucene in that it is backed by an ICU ICU4N.Text.RuleBasedBreakIterator, which differs slightly in default behavior than the one in the JDK. However, the ICU ICU4N.Text.RuleBasedBreakIterator behavior can be customized to meet a lot of scenarios that the one in the JDK cannot. See the ICU documentation at http://userguide.icu-project.org/boundaryanalysis/break-rules for more information how to pass custom rules to an ICU ICU4N.Text.RuleBasedBreakIterator.

Note

This API is experimental and might change in incompatible ways in the next release.

Type	Condition
IOException	if an I/O error occurred during processing
ArgumentException	if `field` was indexed without Lucene.Net.Index.IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS

Highlights the top-N passages from a single field.

Type	Name	Description
string	field	field name to highlight. Must have a stored string value and also be indexed with offsets.
Query	query	query to highlight.
IndexSearcher	searcher	searcher that was previously used to execute the query.
TopDocs	topDocs	TopDocs containing the summary result documents to highlight.
int	maxPassages	The maximum number of top-N ranked passages used to form the highlighted snippets.

Type	Description
string[]	Array of formatted snippets corresponding to the documents in `topDocs`. If no highlights were found for a document, the first `maxPassages` sentences from the field will be returned.

PostingsHighlighter treats the single original document as the whole corpus, and then scores individual passages as if they were documents in this corpus. It uses a ICU4N.Text.BreakIterator to find passages in the text; by default it breaks using GetSentenceInstance(CultureInfo) (for sentence breaking). It then iterates in parallel (merge sorting by offset) through the positions of all terms from the query, coalescing those hits that occur in a single passage into a Passage, and then scores each Passage using a separate PassageScorer. Passages are finally formatted into highlighted snippets with a PassageFormatter.

You can customize the behavior by subclassing this highlighter, some important hooks:

GetBreakIterator(string): Customize how the text is divided into passages.
GetScorer(string): Customize how passages are ranked.
GetFormatter(string): Customize how snippets are formatted.
GetIndexAnalyzer(string): Enable highlighting of MultiTermQuerys such as Lucene.Net.Search.WildcardQuery.

WARNING: The code is very new and probably still has some exciting bugs!

Example usage:

// configure field with offsets at index time
IndexableFieldType offsetsType = new IndexableFieldType(TextField.TYPE_STORED);
offsetsType.IndexOptions = IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS;
Field body = new Field("body", "foobar", offsetsType);
// retrieve highlights at query time
ICUPostingsHighlighter highlighter = new ICUPostingsHighlighter();
Query query = new TermQuery(new Term("body", "highlighting"));
TopDocs topDocs = searcher.Search(query, n);
string highlights[] = highlighter.Highlight("body", query, searcher, topDocs);

This is thread-safe, and can be used across different readers.

Note that the .NET implementation differs from the PostingsHighlighter in Lucene in that it is backed by an ICU ICU4N.Text.RuleBasedBreakIterator, which differs slightly in default behavior than the one in the JDK. However, the ICU ICU4N.Text.RuleBasedBreakIterator behavior can be customized to meet a lot of scenarios that the one in the JDK cannot. See the ICU documentation at http://userguide.icu-project.org/boundaryanalysis/break-rules for more information how to pass custom rules to an ICU ICU4N.Text.RuleBasedBreakIterator.

Note

This API is experimental and might change in incompatible ways in the next release.

Type	Condition
IOException	if an I/O error occurred during processing
ArgumentException	Illegal if `field` was indexed without Lucene.Net.Index.IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS

Highlights the top passages from multiple fields.

Conceptually, this behaves as a more efficient form of:

IDictionary<string, string[]> m = new Dictionary<string, string[]>();
foreach (string field in fields)
{
    m[field] = Highlight(field, query, searcher, topDocs);
}
return m;

Type	Name	Description
string[]	fields	field names to highlight. Must have a stored string value and also be indexed with offsets.
Query	query	query to highlight.
IndexSearcher	searcher	searcher that was previously used to execute the query.
TopDocs	topDocs	TopDocs containing the summary result documents to highlight.

Type	Description
IDictionary<string, string[]>	IDictionary{string,string[]} keyed on field name, containing the array of formatted snippets corresponding to the documents in `topDocs`. If no highlights were found for a document, the first sentence from the field will be returned.

PostingsHighlighter treats the single original document as the whole corpus, and then scores individual passages as if they were documents in this corpus. It uses a ICU4N.Text.BreakIterator to find passages in the text; by default it breaks using GetSentenceInstance(CultureInfo) (for sentence breaking). It then iterates in parallel (merge sorting by offset) through the positions of all terms from the query, coalescing those hits that occur in a single passage into a Passage, and then scores each Passage using a separate PassageScorer. Passages are finally formatted into highlighted snippets with a PassageFormatter.

You can customize the behavior by subclassing this highlighter, some important hooks:

GetBreakIterator(string): Customize how the text is divided into passages.
GetScorer(string): Customize how passages are ranked.
GetFormatter(string): Customize how snippets are formatted.
GetIndexAnalyzer(string): Enable highlighting of MultiTermQuerys such as Lucene.Net.Search.WildcardQuery.

WARNING: The code is very new and probably still has some exciting bugs!

Example usage:

// configure field with offsets at index time
IndexableFieldType offsetsType = new IndexableFieldType(TextField.TYPE_STORED);
offsetsType.IndexOptions = IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS;
Field body = new Field("body", "foobar", offsetsType);
// retrieve highlights at query time
ICUPostingsHighlighter highlighter = new ICUPostingsHighlighter();
Query query = new TermQuery(new Term("body", "highlighting"));
TopDocs topDocs = searcher.Search(query, n);
string highlights[] = highlighter.Highlight("body", query, searcher, topDocs);

This is thread-safe, and can be used across different readers.

Note that the .NET implementation differs from the PostingsHighlighter in Lucene in that it is backed by an ICU ICU4N.Text.RuleBasedBreakIterator, which differs slightly in default behavior than the one in the JDK. However, the ICU ICU4N.Text.RuleBasedBreakIterator behavior can be customized to meet a lot of scenarios that the one in the JDK cannot. See the ICU documentation at http://userguide.icu-project.org/boundaryanalysis/break-rules for more information how to pass custom rules to an ICU ICU4N.Text.RuleBasedBreakIterator.

Note

This API is experimental and might change in incompatible ways in the next release.

Type	Condition
IOException	if an I/O error occurred during processing
ArgumentException	if `field` was indexed without Lucene.Net.Index.IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS

Highlights the top-N passages from multiple fields.

Conceptually, this behaves as a more efficient form of:

IDictionary<string, string[]> m = new Dictionary<string, string[]>();
foreach (string field in fields)
{
    m[field] = Highlight(field, query, searcher, topDocs, maxPassages);
}
return m;

Type	Name	Description
string[]	fields	field names to highlight. Must have a stored string value and also be indexed with offsets.
Query	query	query to highlight.
IndexSearcher	searcher	searcher that was previously used to execute the query.
TopDocs	topDocs	TopDocs containing the summary result documents to highlight.
int[]	maxPassages	The maximum number of top-N ranked passages per-field used to form the highlighted snippets.

Type	Description
IDictionary<string, string[]>	IDictionary{string,string[]} keyed on field name, containing the array of formatted snippets corresponding to the documents in `topDocs`. If no highlights were found for a document, the first `maxPassages` sentences from the field will be returned.

PostingsHighlighter treats the single original document as the whole corpus, and then scores individual passages as if they were documents in this corpus. It uses a ICU4N.Text.BreakIterator to find passages in the text; by default it breaks using GetSentenceInstance(CultureInfo) (for sentence breaking). It then iterates in parallel (merge sorting by offset) through the positions of all terms from the query, coalescing those hits that occur in a single passage into a Passage, and then scores each Passage using a separate PassageScorer. Passages are finally formatted into highlighted snippets with a PassageFormatter.

You can customize the behavior by subclassing this highlighter, some important hooks:

GetBreakIterator(string): Customize how the text is divided into passages.
GetScorer(string): Customize how passages are ranked.
GetFormatter(string): Customize how snippets are formatted.
GetIndexAnalyzer(string): Enable highlighting of MultiTermQuerys such as Lucene.Net.Search.WildcardQuery.

WARNING: The code is very new and probably still has some exciting bugs!

Example usage:

// configure field with offsets at index time
IndexableFieldType offsetsType = new IndexableFieldType(TextField.TYPE_STORED);
offsetsType.IndexOptions = IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS;
Field body = new Field("body", "foobar", offsetsType);
// retrieve highlights at query time
ICUPostingsHighlighter highlighter = new ICUPostingsHighlighter();
Query query = new TermQuery(new Term("body", "highlighting"));
TopDocs topDocs = searcher.Search(query, n);
string highlights[] = highlighter.Highlight("body", query, searcher, topDocs);

This is thread-safe, and can be used across different readers.

Note that the .NET implementation differs from the PostingsHighlighter in Lucene in that it is backed by an ICU ICU4N.Text.RuleBasedBreakIterator, which differs slightly in default behavior than the one in the JDK. However, the ICU ICU4N.Text.RuleBasedBreakIterator behavior can be customized to meet a lot of scenarios that the one in the JDK cannot. See the ICU documentation at http://userguide.icu-project.org/boundaryanalysis/break-rules for more information how to pass custom rules to an ICU ICU4N.Text.RuleBasedBreakIterator.

Note

This API is experimental and might change in incompatible ways in the next release.

Type	Condition
IOException	if an I/O error occurred during processing
ArgumentException	if `field` was indexed without Lucene.Net.Index.IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS

Highlights the top-N passages from multiple fields, for the provided int[] docids.

Type	Name	Description
string[]	fieldsIn	field names to highlight. Must have a stored string value and also be indexed with offsets.
Query	query	query to highlight.
IndexSearcher	searcher	searcher that was previously used to execute the query.
int[]	docidsIn	containing the document IDs to highlight.
int[]	maxPassagesIn	The maximum number of top-N ranked passages per-field used to form the highlighted snippets.

Type	Description
IDictionary<string, string[]>	IDictionary{string,string[]} keyed on field name, containing the array of formatted snippets corresponding to the documents in `docidsIn`. If no highlights were found for a document, the first `maxPassages` from the field will be returned.

PostingsHighlighter treats the single original document as the whole corpus, and then scores individual passages as if they were documents in this corpus. It uses a ICU4N.Text.BreakIterator to find passages in the text; by default it breaks using GetSentenceInstance(CultureInfo) (for sentence breaking). It then iterates in parallel (merge sorting by offset) through the positions of all terms from the query, coalescing those hits that occur in a single passage into a Passage, and then scores each Passage using a separate PassageScorer. Passages are finally formatted into highlighted snippets with a PassageFormatter.

You can customize the behavior by subclassing this highlighter, some important hooks:

GetBreakIterator(string): Customize how the text is divided into passages.
GetScorer(string): Customize how passages are ranked.
GetFormatter(string): Customize how snippets are formatted.
GetIndexAnalyzer(string): Enable highlighting of MultiTermQuerys such as Lucene.Net.Search.WildcardQuery.

WARNING: The code is very new and probably still has some exciting bugs!

Example usage:

// configure field with offsets at index time
IndexableFieldType offsetsType = new IndexableFieldType(TextField.TYPE_STORED);
offsetsType.IndexOptions = IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS;
Field body = new Field("body", "foobar", offsetsType);
// retrieve highlights at query time
ICUPostingsHighlighter highlighter = new ICUPostingsHighlighter();
Query query = new TermQuery(new Term("body", "highlighting"));
TopDocs topDocs = searcher.Search(query, n);
string highlights[] = highlighter.Highlight("body", query, searcher, topDocs);

This is thread-safe, and can be used across different readers.

Note that the .NET implementation differs from the PostingsHighlighter in Lucene in that it is backed by an ICU ICU4N.Text.RuleBasedBreakIterator, which differs slightly in default behavior than the one in the JDK. However, the ICU ICU4N.Text.RuleBasedBreakIterator behavior can be customized to meet a lot of scenarios that the one in the JDK cannot. See the ICU documentation at http://userguide.icu-project.org/boundaryanalysis/break-rules for more information how to pass custom rules to an ICU ICU4N.Text.RuleBasedBreakIterator.

Note

This API is experimental and might change in incompatible ways in the next release.

Type	Condition
IOException	if an I/O error occurred during processing
ArgumentException	if `field` was indexed without Lucene.Net.Index.IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS

Expert: highlights the top-N passages from multiple fields, for the provided int[] docids, to custom object as returned by the PassageFormatter. Use this API to render to something other than string.

Type	Name	Description
string[]	fieldsIn	field names to highlight. Must have a stored string value and also be indexed with offsets.
Query	query	query to highlight.
IndexSearcher	searcher	searcher that was previously used to execute the query.
int[]	docidsIn	containing the document IDs to highlight.
int[]	maxPassagesIn	The maximum number of top-N ranked passages per-field used to form the highlighted snippets.

Type	Description
IDictionary<string, object[]>	IDictionary{string,object[]} keyed on field name, containing the array of formatted snippets corresponding to the documents in `docidsIn`. If no highlights were found for a document, the first `maxPassagesIn` from the field will be returned.

PostingsHighlighter treats the single original document as the whole corpus, and then scores individual passages as if they were documents in this corpus. It uses a ICU4N.Text.BreakIterator to find passages in the text; by default it breaks using GetSentenceInstance(CultureInfo) (for sentence breaking). It then iterates in parallel (merge sorting by offset) through the positions of all terms from the query, coalescing those hits that occur in a single passage into a Passage, and then scores each Passage using a separate PassageScorer. Passages are finally formatted into highlighted snippets with a PassageFormatter.

You can customize the behavior by subclassing this highlighter, some important hooks:

GetBreakIterator(string): Customize how the text is divided into passages.
GetScorer(string): Customize how passages are ranked.
GetFormatter(string): Customize how snippets are formatted.
GetIndexAnalyzer(string): Enable highlighting of MultiTermQuerys such as Lucene.Net.Search.WildcardQuery.

WARNING: The code is very new and probably still has some exciting bugs!

Example usage:

// configure field with offsets at index time
IndexableFieldType offsetsType = new IndexableFieldType(TextField.TYPE_STORED);
offsetsType.IndexOptions = IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS;
Field body = new Field("body", "foobar", offsetsType);
// retrieve highlights at query time
ICUPostingsHighlighter highlighter = new ICUPostingsHighlighter();
Query query = new TermQuery(new Term("body", "highlighting"));
TopDocs topDocs = searcher.Search(query, n);
string highlights[] = highlighter.Highlight("body", query, searcher, topDocs);

This is thread-safe, and can be used across different readers.

Note that the .NET implementation differs from the PostingsHighlighter in Lucene in that it is backed by an ICU ICU4N.Text.RuleBasedBreakIterator, which differs slightly in default behavior than the one in the JDK. However, the ICU ICU4N.Text.RuleBasedBreakIterator behavior can be customized to meet a lot of scenarios that the one in the JDK cannot. See the ICU documentation at http://userguide.icu-project.org/boundaryanalysis/break-rules for more information how to pass custom rules to an ICU ICU4N.Text.RuleBasedBreakIterator.

Note

This API is experimental and might change in incompatible ways in the next release.

Type	Condition
IOException	if an I/O error occurred during processing
ArgumentException	if `field` was indexed without Lucene.Net.Index.IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS

Loads the string values for each field X docID to be highlighted. By default this loads from stored fields, but a subclass can change the source. This method should allocate the string[fields.length][docids.length] and fill all values. The returned strings must be identical to what was indexed.

Type	Name	Description
IndexSearcher	searcher
string[]	fields
int[]	docids
int	maxLength

Type	Description
IList<string[]>

PostingsHighlighter treats the single original document as the whole corpus, and then scores individual passages as if they were documents in this corpus. It uses a ICU4N.Text.BreakIterator to find passages in the text; by default it breaks using GetSentenceInstance(CultureInfo) (for sentence breaking). It then iterates in parallel (merge sorting by offset) through the positions of all terms from the query, coalescing those hits that occur in a single passage into a Passage, and then scores each Passage using a separate PassageScorer. Passages are finally formatted into highlighted snippets with a PassageFormatter.

You can customize the behavior by subclassing this highlighter, some important hooks:

GetBreakIterator(string): Customize how the text is divided into passages.
GetScorer(string): Customize how passages are ranked.
GetFormatter(string): Customize how snippets are formatted.
GetIndexAnalyzer(string): Enable highlighting of MultiTermQuerys such as Lucene.Net.Search.WildcardQuery.

WARNING: The code is very new and probably still has some exciting bugs!

Example usage:

// configure field with offsets at index time
IndexableFieldType offsetsType = new IndexableFieldType(TextField.TYPE_STORED);
offsetsType.IndexOptions = IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS;
Field body = new Field("body", "foobar", offsetsType);
// retrieve highlights at query time
ICUPostingsHighlighter highlighter = new ICUPostingsHighlighter();
Query query = new TermQuery(new Term("body", "highlighting"));
TopDocs topDocs = searcher.Search(query, n);
string highlights[] = highlighter.Highlight("body", query, searcher, topDocs);

This is thread-safe, and can be used across different readers.

Note that the .NET implementation differs from the PostingsHighlighter in Lucene in that it is backed by an ICU ICU4N.Text.RuleBasedBreakIterator, which differs slightly in default behavior than the one in the JDK. However, the ICU ICU4N.Text.RuleBasedBreakIterator behavior can be customized to meet a lot of scenarios that the one in the JDK cannot. See the ICU documentation at http://userguide.icu-project.org/boundaryanalysis/break-rules for more information how to pass custom rules to an ICU ICU4N.Text.RuleBasedBreakIterator.

Note

This API is experimental and might change in incompatible ways in the next release.

Inheritance

Inherited Members

Namespace: Lucene.Net.Search.PostingsHighlight

Assembly: Lucene.Net.ICU.dll

Syntax

Remarks

Note

Constructors

ICUPostingsHighlighter()

Declaration

Remarks

Note

ICUPostingsHighlighter(int)

Declaration

Parameters

Remarks

Note

Exceptions

Fields

DEFAULT_MAX_LENGTH

Declaration

Field Value

Remarks

Note

Methods

GetBreakIterator(string)

Declaration

Parameters

Returns

Remarks

Note

GetEmptyHighlight(string, BreakIterator, int)

Declaration

Parameters

Returns

Remarks

Note

GetFormatter(string)

Declaration

Parameters

Returns

Remarks

Note

GetIndexAnalyzer(string)

Declaration

Parameters

Returns

Remarks

Note

GetMultiValuedSeparator(string)

Declaration

Parameters

Returns

Remarks

Note

GetScorer(string)

Declaration

Parameters

Returns

Remarks

Note

Highlight(string, Query, IndexSearcher, TopDocs)

Declaration

Parameters

Returns

Remarks

Note

Exceptions

Highlight(string, Query, IndexSearcher, TopDocs, int)

Declaration

Parameters

Returns

Remarks

Note

Exceptions

HighlightFields(string[], Query, IndexSearcher, TopDocs)

Declaration

Parameters

Returns