Class ICUPostingsHighlighter
Simple highlighter that does not analyze fields nor use term vectors. Instead it requires Lucene.Net.Index.IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS.
Inherited Members
Namespace: Lucene.Net.Search.PostingsHighlight
Assembly: Lucene.Net.ICU.dll
Syntax
public class ICUPostingsHighlighter
Remarks
PostingsHighlighter treats the single original document as the whole corpus, and then scores individual passages as if they were documents in this corpus. It uses a ICU4N.Text.BreakIterator to find passages in the text; by default it breaks using GetSentenceInstance(CultureInfo) (for sentence breaking). It then iterates in parallel (merge sorting by offset) through the positions of all terms from the query, coalescing those hits that occur in a single passage into a Passage, and then scores each Passage using a separate PassageScorer. Passages are finally formatted into highlighted snippets with a PassageFormatter.
You can customize the behavior by subclassing this highlighter, some important hooks:- GetBreakIterator(string): Customize how the text is divided into passages.
- GetScorer(string): Customize how passages are ranked.
- GetFormatter(string): Customize how snippets are formatted.
- GetIndexAnalyzer(string): Enable highlighting of MultiTermQuerys such as Lucene.Net.Search.WildcardQuery.
// configure field with offsets at index time
IndexableFieldType offsetsType = new IndexableFieldType(TextField.TYPE_STORED);
offsetsType.IndexOptions = IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS;
Field body = new Field("body", "foobar", offsetsType);
// retrieve highlights at query time
ICUPostingsHighlighter highlighter = new ICUPostingsHighlighter();
Query query = new TermQuery(new Term("body", "highlighting"));
TopDocs topDocs = searcher.Search(query, n);
string highlights[] = highlighter.Highlight("body", query, searcher, topDocs);
This is thread-safe, and can be used across different readers.
Note that the .NET implementation differs from the PostingsHighlighter
in Lucene in
that it is backed by an ICU ICU4N.Text.RuleBasedBreakIterator, which differs slightly in default behavior
than the one in the JDK. However, the ICU ICU4N.Text.RuleBasedBreakIterator behavior can be customized
to meet a lot of scenarios that the one in the JDK cannot. See the ICU documentation at
http://userguide.icu-project.org/boundaryanalysis/break-rules
for more information how to pass custom rules to an ICU ICU4N.Text.RuleBasedBreakIterator.
Note
This API is experimental and might change in incompatible ways in the next release.
Constructors
ICUPostingsHighlighter()
Creates a new highlighter with DEFAULT_MAX_LENGTH.
Declaration
public ICUPostingsHighlighter()
Remarks
PostingsHighlighter treats the single original document as the whole corpus, and then scores individual passages as if they were documents in this corpus. It uses a ICU4N.Text.BreakIterator to find passages in the text; by default it breaks using GetSentenceInstance(CultureInfo) (for sentence breaking). It then iterates in parallel (merge sorting by offset) through the positions of all terms from the query, coalescing those hits that occur in a single passage into a Passage, and then scores each Passage using a separate PassageScorer. Passages are finally formatted into highlighted snippets with a PassageFormatter.
You can customize the behavior by subclassing this highlighter, some important hooks:- GetBreakIterator(string): Customize how the text is divided into passages.
- GetScorer(string): Customize how passages are ranked.
- GetFormatter(string): Customize how snippets are formatted.
- GetIndexAnalyzer(string): Enable highlighting of MultiTermQuerys such as Lucene.Net.Search.WildcardQuery.
// configure field with offsets at index time
IndexableFieldType offsetsType = new IndexableFieldType(TextField.TYPE_STORED);
offsetsType.IndexOptions = IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS;
Field body = new Field("body", "foobar", offsetsType);
// retrieve highlights at query time
ICUPostingsHighlighter highlighter = new ICUPostingsHighlighter();
Query query = new TermQuery(new Term("body", "highlighting"));
TopDocs topDocs = searcher.Search(query, n);
string highlights[] = highlighter.Highlight("body", query, searcher, topDocs);
This is thread-safe, and can be used across different readers.
Note that the .NET implementation differs from the PostingsHighlighter
in Lucene in
that it is backed by an ICU ICU4N.Text.RuleBasedBreakIterator, which differs slightly in default behavior
than the one in the JDK. However, the ICU ICU4N.Text.RuleBasedBreakIterator behavior can be customized
to meet a lot of scenarios that the one in the JDK cannot. See the ICU documentation at
http://userguide.icu-project.org/boundaryanalysis/break-rules
for more information how to pass custom rules to an ICU ICU4N.Text.RuleBasedBreakIterator.
Note
This API is experimental and might change in incompatible ways in the next release.
ICUPostingsHighlighter(int)
Creates a new highlighter, specifying maximum content length.
Declaration
public ICUPostingsHighlighter(int maxLength)
Parameters
Type | Name | Description |
---|---|---|
int | maxLength | maximum content size to process. |
Remarks
PostingsHighlighter treats the single original document as the whole corpus, and then scores individual passages as if they were documents in this corpus. It uses a ICU4N.Text.BreakIterator to find passages in the text; by default it breaks using GetSentenceInstance(CultureInfo) (for sentence breaking). It then iterates in parallel (merge sorting by offset) through the positions of all terms from the query, coalescing those hits that occur in a single passage into a Passage, and then scores each Passage using a separate PassageScorer. Passages are finally formatted into highlighted snippets with a PassageFormatter.
You can customize the behavior by subclassing this highlighter, some important hooks:- GetBreakIterator(string): Customize how the text is divided into passages.
- GetScorer(string): Customize how passages are ranked.
- GetFormatter(string): Customize how snippets are formatted.
- GetIndexAnalyzer(string): Enable highlighting of MultiTermQuerys such as Lucene.Net.Search.WildcardQuery.
// configure field with offsets at index time
IndexableFieldType offsetsType = new IndexableFieldType(TextField.TYPE_STORED);
offsetsType.IndexOptions = IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS;
Field body = new Field("body", "foobar", offsetsType);
// retrieve highlights at query time
ICUPostingsHighlighter highlighter = new ICUPostingsHighlighter();
Query query = new TermQuery(new Term("body", "highlighting"));
TopDocs topDocs = searcher.Search(query, n);
string highlights[] = highlighter.Highlight("body", query, searcher, topDocs);
This is thread-safe, and can be used across different readers.
Note that the .NET implementation differs from the PostingsHighlighter
in Lucene in
that it is backed by an ICU ICU4N.Text.RuleBasedBreakIterator, which differs slightly in default behavior
than the one in the JDK. However, the ICU ICU4N.Text.RuleBasedBreakIterator behavior can be customized
to meet a lot of scenarios that the one in the JDK cannot. See the ICU documentation at
http://userguide.icu-project.org/boundaryanalysis/break-rules
for more information how to pass custom rules to an ICU ICU4N.Text.RuleBasedBreakIterator.
Note
This API is experimental and might change in incompatible ways in the next release.
Exceptions
Type | Condition |
---|---|
ArgumentException | if |
Fields
DEFAULT_MAX_LENGTH
Default maximum content size to process. Typically snippets closer to the beginning of the document better summarize its content
Declaration
public static readonly int DEFAULT_MAX_LENGTH
Field Value
Type | Description |
---|---|
int |
Remarks
PostingsHighlighter treats the single original document as the whole corpus, and then scores individual passages as if they were documents in this corpus. It uses a ICU4N.Text.BreakIterator to find passages in the text; by default it breaks using GetSentenceInstance(CultureInfo) (for sentence breaking). It then iterates in parallel (merge sorting by offset) through the positions of all terms from the query, coalescing those hits that occur in a single passage into a Passage, and then scores each Passage using a separate PassageScorer. Passages are finally formatted into highlighted snippets with a PassageFormatter.
You can customize the behavior by subclassing this highlighter, some important hooks:- GetBreakIterator(string): Customize how the text is divided into passages.
- GetScorer(string): Customize how passages are ranked.
- GetFormatter(string): Customize how snippets are formatted.
- GetIndexAnalyzer(string): Enable highlighting of MultiTermQuerys such as Lucene.Net.Search.WildcardQuery.
// configure field with offsets at index time
IndexableFieldType offsetsType = new IndexableFieldType(TextField.TYPE_STORED);
offsetsType.IndexOptions = IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS;
Field body = new Field("body", "foobar", offsetsType);
// retrieve highlights at query time
ICUPostingsHighlighter highlighter = new ICUPostingsHighlighter();
Query query = new TermQuery(new Term("body", "highlighting"));
TopDocs topDocs = searcher.Search(query, n);
string highlights[] = highlighter.Highlight("body", query, searcher, topDocs);
This is thread-safe, and can be used across different readers.
Note that the .NET implementation differs from the PostingsHighlighter
in Lucene in
that it is backed by an ICU ICU4N.Text.RuleBasedBreakIterator, which differs slightly in default behavior
than the one in the JDK. However, the ICU ICU4N.Text.RuleBasedBreakIterator behavior can be customized
to meet a lot of scenarios that the one in the JDK cannot. See the ICU documentation at
http://userguide.icu-project.org/boundaryanalysis/break-rules
for more information how to pass custom rules to an ICU ICU4N.Text.RuleBasedBreakIterator.
Note
This API is experimental and might change in incompatible ways in the next release.
Methods
GetBreakIterator(string)
Returns the ICU4N.Text.BreakIterator to use for dividing text into passages. This instantiates an GetSentenceInstance(CultureInfo) by default; subclasses can override to customize.
Declaration
protected virtual BreakIterator GetBreakIterator(string field)
Parameters
Type | Name | Description |
---|---|---|
string | field |
Returns
Type | Description |
---|---|
BreakIterator |
Remarks
PostingsHighlighter treats the single original document as the whole corpus, and then scores individual passages as if they were documents in this corpus. It uses a ICU4N.Text.BreakIterator to find passages in the text; by default it breaks using GetSentenceInstance(CultureInfo) (for sentence breaking). It then iterates in parallel (merge sorting by offset) through the positions of all terms from the query, coalescing those hits that occur in a single passage into a Passage, and then scores each Passage using a separate PassageScorer. Passages are finally formatted into highlighted snippets with a PassageFormatter.
You can customize the behavior by subclassing this highlighter, some important hooks:- GetBreakIterator(string): Customize how the text is divided into passages.
- GetScorer(string): Customize how passages are ranked.
- GetFormatter(string): Customize how snippets are formatted.
- GetIndexAnalyzer(string): Enable highlighting of MultiTermQuerys such as Lucene.Net.Search.WildcardQuery.
// configure field with offsets at index time
IndexableFieldType offsetsType = new IndexableFieldType(TextField.TYPE_STORED);
offsetsType.IndexOptions = IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS;
Field body = new Field("body", "foobar", offsetsType);
// retrieve highlights at query time
ICUPostingsHighlighter highlighter = new ICUPostingsHighlighter();
Query query = new TermQuery(new Term("body", "highlighting"));
TopDocs topDocs = searcher.Search(query, n);
string highlights[] = highlighter.Highlight("body", query, searcher, topDocs);
This is thread-safe, and can be used across different readers.
Note that the .NET implementation differs from the PostingsHighlighter
in Lucene in
that it is backed by an ICU ICU4N.Text.RuleBasedBreakIterator, which differs slightly in default behavior
than the one in the JDK. However, the ICU ICU4N.Text.RuleBasedBreakIterator behavior can be customized
to meet a lot of scenarios that the one in the JDK cannot. See the ICU documentation at
http://userguide.icu-project.org/boundaryanalysis/break-rules
for more information how to pass custom rules to an ICU ICU4N.Text.RuleBasedBreakIterator.
Note
This API is experimental and might change in incompatible ways in the next release.
GetEmptyHighlight(string, BreakIterator, int)
Called to summarize a document when no hits were
found. By default this just returns the first
maxPassages
sentences; subclasses can override
to customize.
Declaration
protected virtual Passage[] GetEmptyHighlight(string fieldName, BreakIterator bi, int maxPassages)
Parameters
Type | Name | Description |
---|---|---|
string | fieldName | |
BreakIterator | bi | |
int | maxPassages |
Returns
Type | Description |
---|---|
Passage[] |
Remarks
PostingsHighlighter treats the single original document as the whole corpus, and then scores individual passages as if they were documents in this corpus. It uses a ICU4N.Text.BreakIterator to find passages in the text; by default it breaks using GetSentenceInstance(CultureInfo) (for sentence breaking). It then iterates in parallel (merge sorting by offset) through the positions of all terms from the query, coalescing those hits that occur in a single passage into a Passage, and then scores each Passage using a separate PassageScorer. Passages are finally formatted into highlighted snippets with a PassageFormatter.
You can customize the behavior by subclassing this highlighter, some important hooks:- GetBreakIterator(string): Customize how the text is divided into passages.
- GetScorer(string): Customize how passages are ranked.
- GetFormatter(string): Customize how snippets are formatted.
- GetIndexAnalyzer(string): Enable highlighting of MultiTermQuerys such as Lucene.Net.Search.WildcardQuery.
// configure field with offsets at index time
IndexableFieldType offsetsType = new IndexableFieldType(TextField.TYPE_STORED);
offsetsType.IndexOptions = IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS;
Field body = new Field("body", "foobar", offsetsType);
// retrieve highlights at query time
ICUPostingsHighlighter highlighter = new ICUPostingsHighlighter();
Query query = new TermQuery(new Term("body", "highlighting"));
TopDocs topDocs = searcher.Search(query, n);
string highlights[] = highlighter.Highlight("body", query, searcher, topDocs);
This is thread-safe, and can be used across different readers.
Note that the .NET implementation differs from the PostingsHighlighter
in Lucene in
that it is backed by an ICU ICU4N.Text.RuleBasedBreakIterator, which differs slightly in default behavior
than the one in the JDK. However, the ICU ICU4N.Text.RuleBasedBreakIterator behavior can be customized
to meet a lot of scenarios that the one in the JDK cannot. See the ICU documentation at
http://userguide.icu-project.org/boundaryanalysis/break-rules
for more information how to pass custom rules to an ICU ICU4N.Text.RuleBasedBreakIterator.
Note
This API is experimental and might change in incompatible ways in the next release.
GetFormatter(string)
Returns the PassageFormatter to use for formatting passages into highlighted snippets. This returns a new PassageFormatter by default; subclasses can override to customize.
Declaration
protected virtual PassageFormatter GetFormatter(string field)
Parameters
Type | Name | Description |
---|---|---|
string | field |
Returns
Type | Description |
---|---|
PassageFormatter |
Remarks
PostingsHighlighter treats the single original document as the whole corpus, and then scores individual passages as if they were documents in this corpus. It uses a ICU4N.Text.BreakIterator to find passages in the text; by default it breaks using GetSentenceInstance(CultureInfo) (for sentence breaking). It then iterates in parallel (merge sorting by offset) through the positions of all terms from the query, coalescing those hits that occur in a single passage into a Passage, and then scores each Passage using a separate PassageScorer. Passages are finally formatted into highlighted snippets with a PassageFormatter.
You can customize the behavior by subclassing this highlighter, some important hooks:- GetBreakIterator(string): Customize how the text is divided into passages.
- GetScorer(string): Customize how passages are ranked.
- GetFormatter(string): Customize how snippets are formatted.
- GetIndexAnalyzer(string): Enable highlighting of MultiTermQuerys such as Lucene.Net.Search.WildcardQuery.
// configure field with offsets at index time
IndexableFieldType offsetsType = new IndexableFieldType(TextField.TYPE_STORED);
offsetsType.IndexOptions = IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS;
Field body = new Field("body", "foobar", offsetsType);
// retrieve highlights at query time
ICUPostingsHighlighter highlighter = new ICUPostingsHighlighter();
Query query = new TermQuery(new Term("body", "highlighting"));
TopDocs topDocs = searcher.Search(query, n);
string highlights[] = highlighter.Highlight("body", query, searcher, topDocs);
This is thread-safe, and can be used across different readers.
Note that the .NET implementation differs from the PostingsHighlighter
in Lucene in
that it is backed by an ICU ICU4N.Text.RuleBasedBreakIterator, which differs slightly in default behavior
than the one in the JDK. However, the ICU ICU4N.Text.RuleBasedBreakIterator behavior can be customized
to meet a lot of scenarios that the one in the JDK cannot. See the ICU documentation at
http://userguide.icu-project.org/boundaryanalysis/break-rules
for more information how to pass custom rules to an ICU ICU4N.Text.RuleBasedBreakIterator.
Note
This API is experimental and might change in incompatible ways in the next release.
GetIndexAnalyzer(string)
Returns the analyzer originally used to index the content for field
.
Declaration
protected virtual Analyzer GetIndexAnalyzer(string field)
Parameters
Type | Name | Description |
---|---|---|
string | field |
Returns
Type | Description |
---|---|
Analyzer | Lucene.Net.Analysis.Analyzer or null (the default, meaning no special multi-term processing) |
Remarks
PostingsHighlighter treats the single original document as the whole corpus, and then scores individual passages as if they were documents in this corpus. It uses a ICU4N.Text.BreakIterator to find passages in the text; by default it breaks using GetSentenceInstance(CultureInfo) (for sentence breaking). It then iterates in parallel (merge sorting by offset) through the positions of all terms from the query, coalescing those hits that occur in a single passage into a Passage, and then scores each Passage using a separate PassageScorer. Passages are finally formatted into highlighted snippets with a PassageFormatter.
You can customize the behavior by subclassing this highlighter, some important hooks:- GetBreakIterator(string): Customize how the text is divided into passages.
- GetScorer(string): Customize how passages are ranked.
- GetFormatter(string): Customize how snippets are formatted.
- GetIndexAnalyzer(string): Enable highlighting of MultiTermQuerys such as Lucene.Net.Search.WildcardQuery.
// configure field with offsets at index time
IndexableFieldType offsetsType = new IndexableFieldType(TextField.TYPE_STORED);
offsetsType.IndexOptions = IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS;
Field body = new Field("body", "foobar", offsetsType);
// retrieve highlights at query time
ICUPostingsHighlighter highlighter = new ICUPostingsHighlighter();
Query query = new TermQuery(new Term("body", "highlighting"));
TopDocs topDocs = searcher.Search(query, n);
string highlights[] = highlighter.Highlight("body", query, searcher, topDocs);
This is thread-safe, and can be used across different readers.
Note that the .NET implementation differs from the PostingsHighlighter
in Lucene in
that it is backed by an ICU ICU4N.Text.RuleBasedBreakIterator, which differs slightly in default behavior
than the one in the JDK. However, the ICU ICU4N.Text.RuleBasedBreakIterator behavior can be customized
to meet a lot of scenarios that the one in the JDK cannot. See the ICU documentation at
http://userguide.icu-project.org/boundaryanalysis/break-rules
for more information how to pass custom rules to an ICU ICU4N.Text.RuleBasedBreakIterator.
Note
This API is experimental and might change in incompatible ways in the next release.
GetMultiValuedSeparator(string)
Returns the logical separator between values for multi-valued fields.
The default value is a space character, which means passages can span across values,
but a subclass can override, for example with U+2029 PARAGRAPH SEPARATOR (PS)
if each value holds a discrete passage for highlighting.
Declaration
protected virtual char GetMultiValuedSeparator(string field)
Parameters
Type | Name | Description |
---|---|---|
string | field |
Returns
Type | Description |
---|---|
char |
Remarks
PostingsHighlighter treats the single original document as the whole corpus, and then scores individual passages as if they were documents in this corpus. It uses a ICU4N.Text.BreakIterator to find passages in the text; by default it breaks using GetSentenceInstance(CultureInfo) (for sentence breaking). It then iterates in parallel (merge sorting by offset) through the positions of all terms from the query, coalescing those hits that occur in a single passage into a Passage, and then scores each Passage using a separate PassageScorer. Passages are finally formatted into highlighted snippets with a PassageFormatter.
You can customize the behavior by subclassing this highlighter, some important hooks:- GetBreakIterator(string): Customize how the text is divided into passages.
- GetScorer(string): Customize how passages are ranked.
- GetFormatter(string): Customize how snippets are formatted.
- GetIndexAnalyzer(string): Enable highlighting of MultiTermQuerys such as Lucene.Net.Search.WildcardQuery.
// configure field with offsets at index time
IndexableFieldType offsetsType = new IndexableFieldType(TextField.TYPE_STORED);
offsetsType.IndexOptions = IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS;
Field body = new Field("body", "foobar", offsetsType);
// retrieve highlights at query time
ICUPostingsHighlighter highlighter = new ICUPostingsHighlighter();
Query query = new TermQuery(new Term("body", "highlighting"));
TopDocs topDocs = searcher.Search(query, n);
string highlights[] = highlighter.Highlight("body", query, searcher, topDocs);
This is thread-safe, and can be used across different readers.
Note that the .NET implementation differs from the PostingsHighlighter
in Lucene in
that it is backed by an ICU ICU4N.Text.RuleBasedBreakIterator, which differs slightly in default behavior
than the one in the JDK. However, the ICU ICU4N.Text.RuleBasedBreakIterator behavior can be customized
to meet a lot of scenarios that the one in the JDK cannot. See the ICU documentation at
http://userguide.icu-project.org/boundaryanalysis/break-rules
for more information how to pass custom rules to an ICU ICU4N.Text.RuleBasedBreakIterator.
Note
This API is experimental and might change in incompatible ways in the next release.
GetScorer(string)
Returns the PassageScorer to use for ranking passages. This returns a new PassageScorer by default; subclasses can override to customize.
Declaration
protected virtual PassageScorer GetScorer(string field)
Parameters
Type | Name | Description |
---|---|---|
string | field |
Returns
Type | Description |
---|---|
PassageScorer |
Remarks
PostingsHighlighter treats the single original document as the whole corpus, and then scores individual passages as if they were documents in this corpus. It uses a ICU4N.Text.BreakIterator to find passages in the text; by default it breaks using GetSentenceInstance(CultureInfo) (for sentence breaking). It then iterates in parallel (merge sorting by offset) through the positions of all terms from the query, coalescing those hits that occur in a single passage into a Passage, and then scores each Passage using a separate PassageScorer. Passages are finally formatted into highlighted snippets with a PassageFormatter.
You can customize the behavior by subclassing this highlighter, some important hooks:- GetBreakIterator(string): Customize how the text is divided into passages.
- GetScorer(string): Customize how passages are ranked.
- GetFormatter(string): Customize how snippets are formatted.
- GetIndexAnalyzer(string): Enable highlighting of MultiTermQuerys such as Lucene.Net.Search.WildcardQuery.
// configure field with offsets at index time
IndexableFieldType offsetsType = new IndexableFieldType(TextField.TYPE_STORED);
offsetsType.IndexOptions = IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS;
Field body = new Field("body", "foobar", offsetsType);
// retrieve highlights at query time
ICUPostingsHighlighter highlighter = new ICUPostingsHighlighter();
Query query = new TermQuery(new Term("body", "highlighting"));
TopDocs topDocs = searcher.Search(query, n);
string highlights[] = highlighter.Highlight("body", query, searcher, topDocs);
This is thread-safe, and can be used across different readers.
Note that the .NET implementation differs from the PostingsHighlighter
in Lucene in
that it is backed by an ICU ICU4N.Text.RuleBasedBreakIterator, which differs slightly in default behavior
than the one in the JDK. However, the ICU ICU4N.Text.RuleBasedBreakIterator behavior can be customized
to meet a lot of scenarios that the one in the JDK cannot. See the ICU documentation at
http://userguide.icu-project.org/boundaryanalysis/break-rules
for more information how to pass custom rules to an ICU ICU4N.Text.RuleBasedBreakIterator.
Note
This API is experimental and might change in incompatible ways in the next release.
Highlight(string, Query, IndexSearcher, TopDocs)
Highlights the top passages from a single field.
Declaration
public virtual string[] Highlight(string field, Query query, IndexSearcher searcher, TopDocs topDocs)
Parameters
Type | Name | Description |
---|---|---|
string | field | field name to highlight. Must have a stored string value and also be indexed with offsets. |
Query | query | query to highlight. |
IndexSearcher | searcher | searcher that was previously used to execute the query. |
TopDocs | topDocs | TopDocs containing the summary result documents to highlight. |
Returns
Type | Description |
---|---|
string[] | Array of formatted snippets corresponding to the documents in |
Remarks
PostingsHighlighter treats the single original document as the whole corpus, and then scores individual passages as if they were documents in this corpus. It uses a ICU4N.Text.BreakIterator to find passages in the text; by default it breaks using GetSentenceInstance(CultureInfo) (for sentence breaking). It then iterates in parallel (merge sorting by offset) through the positions of all terms from the query, coalescing those hits that occur in a single passage into a Passage, and then scores each Passage using a separate PassageScorer. Passages are finally formatted into highlighted snippets with a PassageFormatter.
You can customize the behavior by subclassing this highlighter, some important hooks:- GetBreakIterator(string): Customize how the text is divided into passages.
- GetScorer(string): Customize how passages are ranked.
- GetFormatter(string): Customize how snippets are formatted.
- GetIndexAnalyzer(string): Enable highlighting of MultiTermQuerys such as Lucene.Net.Search.WildcardQuery.
// configure field with offsets at index time
IndexableFieldType offsetsType = new IndexableFieldType(TextField.TYPE_STORED);
offsetsType.IndexOptions = IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS;
Field body = new Field("body", "foobar", offsetsType);
// retrieve highlights at query time
ICUPostingsHighlighter highlighter = new ICUPostingsHighlighter();
Query query = new TermQuery(new Term("body", "highlighting"));
TopDocs topDocs = searcher.Search(query, n);
string highlights[] = highlighter.Highlight("body", query, searcher, topDocs);
This is thread-safe, and can be used across different readers.
Note that the .NET implementation differs from the PostingsHighlighter
in Lucene in
that it is backed by an ICU ICU4N.Text.RuleBasedBreakIterator, which differs slightly in default behavior
than the one in the JDK. However, the ICU ICU4N.Text.RuleBasedBreakIterator behavior can be customized
to meet a lot of scenarios that the one in the JDK cannot. See the ICU documentation at
http://userguide.icu-project.org/boundaryanalysis/break-rules
for more information how to pass custom rules to an ICU ICU4N.Text.RuleBasedBreakIterator.
Note
This API is experimental and might change in incompatible ways in the next release.
Exceptions
Type | Condition |
---|---|
IOException | if an I/O error occurred during processing |
ArgumentException | if |
Highlight(string, Query, IndexSearcher, TopDocs, int)
Highlights the top-N passages from a single field.
Declaration
public virtual string[] Highlight(string field, Query query, IndexSearcher searcher, TopDocs topDocs, int maxPassages)
Parameters
Type | Name | Description |
---|---|---|
string | field | field name to highlight. Must have a stored string value and also be indexed with offsets. |
Query | query | query to highlight. |
IndexSearcher | searcher | searcher that was previously used to execute the query. |
TopDocs | topDocs | TopDocs containing the summary result documents to highlight. |
int | maxPassages | The maximum number of top-N ranked passages used to form the highlighted snippets. |
Returns
Type | Description |
---|---|
string[] | Array of formatted snippets corresponding to the documents in |
Remarks
PostingsHighlighter treats the single original document as the whole corpus, and then scores individual passages as if they were documents in this corpus. It uses a ICU4N.Text.BreakIterator to find passages in the text; by default it breaks using GetSentenceInstance(CultureInfo) (for sentence breaking). It then iterates in parallel (merge sorting by offset) through the positions of all terms from the query, coalescing those hits that occur in a single passage into a Passage, and then scores each Passage using a separate PassageScorer. Passages are finally formatted into highlighted snippets with a PassageFormatter.
You can customize the behavior by subclassing this highlighter, some important hooks:- GetBreakIterator(string): Customize how the text is divided into passages.
- GetScorer(string): Customize how passages are ranked.
- GetFormatter(string): Customize how snippets are formatted.
- GetIndexAnalyzer(string): Enable highlighting of MultiTermQuerys such as Lucene.Net.Search.WildcardQuery.
// configure field with offsets at index time
IndexableFieldType offsetsType = new IndexableFieldType(TextField.TYPE_STORED);
offsetsType.IndexOptions = IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS;
Field body = new Field("body", "foobar", offsetsType);
// retrieve highlights at query time
ICUPostingsHighlighter highlighter = new ICUPostingsHighlighter();
Query query = new TermQuery(new Term("body", "highlighting"));
TopDocs topDocs = searcher.Search(query, n);
string highlights[] = highlighter.Highlight("body", query, searcher, topDocs);
This is thread-safe, and can be used across different readers.
Note that the .NET implementation differs from the PostingsHighlighter
in Lucene in
that it is backed by an ICU ICU4N.Text.RuleBasedBreakIterator, which differs slightly in default behavior
than the one in the JDK. However, the ICU ICU4N.Text.RuleBasedBreakIterator behavior can be customized
to meet a lot of scenarios that the one in the JDK cannot. See the ICU documentation at
http://userguide.icu-project.org/boundaryanalysis/break-rules
for more information how to pass custom rules to an ICU ICU4N.Text.RuleBasedBreakIterator.
Note
This API is experimental and might change in incompatible ways in the next release.
Exceptions
Type | Condition |
---|---|
IOException | if an I/O error occurred during processing |
ArgumentException | Illegal if |
HighlightFields(string[], Query, IndexSearcher, TopDocs)
Highlights the top passages from multiple fields.
Conceptually, this behaves as a more efficient form of:IDictionary<string, string[]> m = new Dictionary<string, string[]>();
foreach (string field in fields)
{
m[field] = Highlight(field, query, searcher, topDocs);
}
return m;
Declaration
public virtual IDictionary<string, string[]> HighlightFields(string[] fields, Query query, IndexSearcher searcher, TopDocs topDocs)
Parameters
Type | Name | Description |
---|---|---|
string[] | fields | field names to highlight. Must have a stored string value and also be indexed with offsets. |
Query | query | query to highlight. |
IndexSearcher | searcher | searcher that was previously used to execute the query. |
TopDocs | topDocs | TopDocs containing the summary result documents to highlight. |
Returns
Type | Description |
---|---|
IDictionary<string, string[]> | IDictionary{string,string[]} keyed on field name, containing the array of formatted snippets
corresponding to the documents in |
Remarks
PostingsHighlighter treats the single original document as the whole corpus, and then scores individual passages as if they were documents in this corpus. It uses a ICU4N.Text.BreakIterator to find passages in the text; by default it breaks using GetSentenceInstance(CultureInfo) (for sentence breaking). It then iterates in parallel (merge sorting by offset) through the positions of all terms from the query, coalescing those hits that occur in a single passage into a Passage, and then scores each Passage using a separate PassageScorer. Passages are finally formatted into highlighted snippets with a PassageFormatter.
You can customize the behavior by subclassing this highlighter, some important hooks:- GetBreakIterator(string): Customize how the text is divided into passages.
- GetScorer(string): Customize how passages are ranked.
- GetFormatter(string): Customize how snippets are formatted.
- GetIndexAnalyzer(string): Enable highlighting of MultiTermQuerys such as Lucene.Net.Search.WildcardQuery.
// configure field with offsets at index time
IndexableFieldType offsetsType = new IndexableFieldType(TextField.TYPE_STORED);
offsetsType.IndexOptions = IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS;
Field body = new Field("body", "foobar", offsetsType);
// retrieve highlights at query time
ICUPostingsHighlighter highlighter = new ICUPostingsHighlighter();
Query query = new TermQuery(new Term("body", "highlighting"));
TopDocs topDocs = searcher.Search(query, n);
string highlights[] = highlighter.Highlight("body", query, searcher, topDocs);
This is thread-safe, and can be used across different readers.
Note that the .NET implementation differs from the PostingsHighlighter
in Lucene in
that it is backed by an ICU ICU4N.Text.RuleBasedBreakIterator, which differs slightly in default behavior
than the one in the JDK. However, the ICU ICU4N.Text.RuleBasedBreakIterator behavior can be customized
to meet a lot of scenarios that the one in the JDK cannot. See the ICU documentation at
http://userguide.icu-project.org/boundaryanalysis/break-rules
for more information how to pass custom rules to an ICU ICU4N.Text.RuleBasedBreakIterator.
Note
This API is experimental and might change in incompatible ways in the next release.
Exceptions
Type | Condition |
---|---|
IOException | if an I/O error occurred during processing |
ArgumentException | if |
HighlightFields(string[], Query, IndexSearcher, TopDocs, int[])
Highlights the top-N passages from multiple fields.
Conceptually, this behaves as a more efficient form of:IDictionary<string, string[]> m = new Dictionary<string, string[]>();
foreach (string field in fields)
{
m[field] = Highlight(field, query, searcher, topDocs, maxPassages);
}
return m;
Declaration
public virtual IDictionary<string, string[]> HighlightFields(string[] fields, Query query, IndexSearcher searcher, TopDocs topDocs, int[] maxPassages)
Parameters
Type | Name | Description |
---|---|---|
string[] | fields | field names to highlight. Must have a stored string value and also be indexed with offsets. |
Query | query | query to highlight. |
IndexSearcher | searcher | searcher that was previously used to execute the query. |
TopDocs | topDocs | TopDocs containing the summary result documents to highlight. |
int[] | maxPassages | The maximum number of top-N ranked passages per-field used to form the highlighted snippets. |
Returns
Type | Description |
---|---|
IDictionary<string, string[]> | IDictionary{string,string[]} keyed on field name, containing the array of formatted snippets
corresponding to the documents in |
Remarks
PostingsHighlighter treats the single original document as the whole corpus, and then scores individual passages as if they were documents in this corpus. It uses a ICU4N.Text.BreakIterator to find passages in the text; by default it breaks using GetSentenceInstance(CultureInfo) (for sentence breaking). It then iterates in parallel (merge sorting by offset) through the positions of all terms from the query, coalescing those hits that occur in a single passage into a Passage, and then scores each Passage using a separate PassageScorer. Passages are finally formatted into highlighted snippets with a PassageFormatter.
You can customize the behavior by subclassing this highlighter, some important hooks:- GetBreakIterator(string): Customize how the text is divided into passages.
- GetScorer(string): Customize how passages are ranked.
- GetFormatter(string): Customize how snippets are formatted.
- GetIndexAnalyzer(string): Enable highlighting of MultiTermQuerys such as Lucene.Net.Search.WildcardQuery.
// configure field with offsets at index time
IndexableFieldType offsetsType = new IndexableFieldType(TextField.TYPE_STORED);
offsetsType.IndexOptions = IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS;
Field body = new Field("body", "foobar", offsetsType);
// retrieve highlights at query time
ICUPostingsHighlighter highlighter = new ICUPostingsHighlighter();
Query query = new TermQuery(new Term("body", "highlighting"));
TopDocs topDocs = searcher.Search(query, n);
string highlights[] = highlighter.Highlight("body", query, searcher, topDocs);
This is thread-safe, and can be used across different readers.
Note that the .NET implementation differs from the PostingsHighlighter
in Lucene in
that it is backed by an ICU ICU4N.Text.RuleBasedBreakIterator, which differs slightly in default behavior
than the one in the JDK. However, the ICU ICU4N.Text.RuleBasedBreakIterator behavior can be customized
to meet a lot of scenarios that the one in the JDK cannot. See the ICU documentation at
http://userguide.icu-project.org/boundaryanalysis/break-rules
for more information how to pass custom rules to an ICU ICU4N.Text.RuleBasedBreakIterator.
Note
This API is experimental and might change in incompatible ways in the next release.
Exceptions
Type | Condition |
---|---|
IOException | if an I/O error occurred during processing |
ArgumentException | if |
HighlightFields(string[], Query, IndexSearcher, int[], int[])
Highlights the top-N passages from multiple fields, for the provided int[] docids.
Declaration
public virtual IDictionary<string, string[]> HighlightFields(string[] fieldsIn, Query query, IndexSearcher searcher, int[] docidsIn, int[] maxPassagesIn)
Parameters
Type | Name | Description |
---|---|---|
string[] | fieldsIn | field names to highlight. Must have a stored string value and also be indexed with offsets. |
Query | query | query to highlight. |
IndexSearcher | searcher | searcher that was previously used to execute the query. |
int[] | docidsIn | containing the document IDs to highlight. |
int[] | maxPassagesIn | The maximum number of top-N ranked passages per-field used to form the highlighted snippets. |
Returns
Type | Description |
---|---|
IDictionary<string, string[]> | IDictionary{string,string[]} keyed on field name, containing the array of formatted snippets
corresponding to the documents in |
Remarks
PostingsHighlighter treats the single original document as the whole corpus, and then scores individual passages as if they were documents in this corpus. It uses a ICU4N.Text.BreakIterator to find passages in the text; by default it breaks using GetSentenceInstance(CultureInfo) (for sentence breaking). It then iterates in parallel (merge sorting by offset) through the positions of all terms from the query, coalescing those hits that occur in a single passage into a Passage, and then scores each Passage using a separate PassageScorer. Passages are finally formatted into highlighted snippets with a PassageFormatter.
You can customize the behavior by subclassing this highlighter, some important hooks:- GetBreakIterator(string): Customize how the text is divided into passages.
- GetScorer(string): Customize how passages are ranked.
- GetFormatter(string): Customize how snippets are formatted.
- GetIndexAnalyzer(string): Enable highlighting of MultiTermQuerys such as Lucene.Net.Search.WildcardQuery.
// configure field with offsets at index time
IndexableFieldType offsetsType = new IndexableFieldType(TextField.TYPE_STORED);
offsetsType.IndexOptions = IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS;
Field body = new Field("body", "foobar", offsetsType);
// retrieve highlights at query time
ICUPostingsHighlighter highlighter = new ICUPostingsHighlighter();
Query query = new TermQuery(new Term("body", "highlighting"));
TopDocs topDocs = searcher.Search(query, n);
string highlights[] = highlighter.Highlight("body", query, searcher, topDocs);
This is thread-safe, and can be used across different readers.
Note that the .NET implementation differs from the PostingsHighlighter
in Lucene in
that it is backed by an ICU ICU4N.Text.RuleBasedBreakIterator, which differs slightly in default behavior
than the one in the JDK. However, the ICU ICU4N.Text.RuleBasedBreakIterator behavior can be customized
to meet a lot of scenarios that the one in the JDK cannot. See the ICU documentation at
http://userguide.icu-project.org/boundaryanalysis/break-rules
for more information how to pass custom rules to an ICU ICU4N.Text.RuleBasedBreakIterator.
Note
This API is experimental and might change in incompatible ways in the next release.
Exceptions
Type | Condition |
---|---|
IOException | if an I/O error occurred during processing |
ArgumentException | if |
HighlightFieldsAsObjects(string[], Query, IndexSearcher, int[], int[])
Expert: highlights the top-N passages from multiple fields, for the provided int[] docids, to custom object as returned by the PassageFormatter. Use this API to render to something other than string.
Declaration
protected virtual IDictionary<string, object[]> HighlightFieldsAsObjects(string[] fieldsIn, Query query, IndexSearcher searcher, int[] docidsIn, int[] maxPassagesIn)
Parameters
Type | Name | Description |
---|---|---|
string[] | fieldsIn | field names to highlight. Must have a stored string value and also be indexed with offsets. |
Query | query | query to highlight. |
IndexSearcher | searcher | searcher that was previously used to execute the query. |
int[] | docidsIn | containing the document IDs to highlight. |
int[] | maxPassagesIn | The maximum number of top-N ranked passages per-field used to form the highlighted snippets. |
Returns
Type | Description |
---|---|
IDictionary<string, object[]> | IDictionary{string,object[]} keyed on field name, containing the array of formatted snippets
corresponding to the documents in |
Remarks
PostingsHighlighter treats the single original document as the whole corpus, and then scores individual passages as if they were documents in this corpus. It uses a ICU4N.Text.BreakIterator to find passages in the text; by default it breaks using GetSentenceInstance(CultureInfo) (for sentence breaking). It then iterates in parallel (merge sorting by offset) through the positions of all terms from the query, coalescing those hits that occur in a single passage into a Passage, and then scores each Passage using a separate PassageScorer. Passages are finally formatted into highlighted snippets with a PassageFormatter.
You can customize the behavior by subclassing this highlighter, some important hooks:- GetBreakIterator(string): Customize how the text is divided into passages.
- GetScorer(string): Customize how passages are ranked.
- GetFormatter(string): Customize how snippets are formatted.
- GetIndexAnalyzer(string): Enable highlighting of MultiTermQuerys such as Lucene.Net.Search.WildcardQuery.
// configure field with offsets at index time
IndexableFieldType offsetsType = new IndexableFieldType(TextField.TYPE_STORED);
offsetsType.IndexOptions = IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS;
Field body = new Field("body", "foobar", offsetsType);
// retrieve highlights at query time
ICUPostingsHighlighter highlighter = new ICUPostingsHighlighter();
Query query = new TermQuery(new Term("body", "highlighting"));
TopDocs topDocs = searcher.Search(query, n);
string highlights[] = highlighter.Highlight("body", query, searcher, topDocs);
This is thread-safe, and can be used across different readers.
Note that the .NET implementation differs from the PostingsHighlighter
in Lucene in
that it is backed by an ICU ICU4N.Text.RuleBasedBreakIterator, which differs slightly in default behavior
than the one in the JDK. However, the ICU ICU4N.Text.RuleBasedBreakIterator behavior can be customized
to meet a lot of scenarios that the one in the JDK cannot. See the ICU documentation at
http://userguide.icu-project.org/boundaryanalysis/break-rules
for more information how to pass custom rules to an ICU ICU4N.Text.RuleBasedBreakIterator.
Note
This API is experimental and might change in incompatible ways in the next release.
Exceptions
Type | Condition |
---|---|
IOException | if an I/O error occurred during processing |
ArgumentException | if |
LoadFieldValues(IndexSearcher, string[], int[], int)
Loads the string values for each field X docID to be highlighted. By default this loads from stored fields, but a subclass can change the source. This method should allocate the string[fields.length][docids.length] and fill all values. The returned strings must be identical to what was indexed.
Declaration
protected virtual IList<string[]> LoadFieldValues(IndexSearcher searcher, string[] fields, int[] docids, int maxLength)
Parameters
Type | Name | Description |
---|---|---|
IndexSearcher | searcher | |
string[] | fields | |
int[] | docids | |
int | maxLength |
Returns
Type | Description |
---|---|
IList<string[]> |
Remarks
PostingsHighlighter treats the single original document as the whole corpus, and then scores individual passages as if they were documents in this corpus. It uses a ICU4N.Text.BreakIterator to find passages in the text; by default it breaks using GetSentenceInstance(CultureInfo) (for sentence breaking). It then iterates in parallel (merge sorting by offset) through the positions of all terms from the query, coalescing those hits that occur in a single passage into a Passage, and then scores each Passage using a separate PassageScorer. Passages are finally formatted into highlighted snippets with a PassageFormatter.
You can customize the behavior by subclassing this highlighter, some important hooks:- GetBreakIterator(string): Customize how the text is divided into passages.
- GetScorer(string): Customize how passages are ranked.
- GetFormatter(string): Customize how snippets are formatted.
- GetIndexAnalyzer(string): Enable highlighting of MultiTermQuerys such as Lucene.Net.Search.WildcardQuery.
// configure field with offsets at index time
IndexableFieldType offsetsType = new IndexableFieldType(TextField.TYPE_STORED);
offsetsType.IndexOptions = IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS;
Field body = new Field("body", "foobar", offsetsType);
// retrieve highlights at query time
ICUPostingsHighlighter highlighter = new ICUPostingsHighlighter();
Query query = new TermQuery(new Term("body", "highlighting"));
TopDocs topDocs = searcher.Search(query, n);
string highlights[] = highlighter.Highlight("body", query, searcher, topDocs);
This is thread-safe, and can be used across different readers.
Note that the .NET implementation differs from the PostingsHighlighter
in Lucene in
that it is backed by an ICU ICU4N.Text.RuleBasedBreakIterator, which differs slightly in default behavior
than the one in the JDK. However, the ICU ICU4N.Text.RuleBasedBreakIterator behavior can be customized
to meet a lot of scenarios that the one in the JDK cannot. See the ICU documentation at
http://userguide.icu-project.org/boundaryanalysis/break-rules
for more information how to pass custom rules to an ICU ICU4N.Text.RuleBasedBreakIterator.
Note
This API is experimental and might change in incompatible ways in the next release.