Class ICUPostingsHighlighter
Simple highlighter that does not analyze fields nor use term vectors. Instead it requires DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS.
Inheritance
Namespace: Lucene.Net.Search.PostingsHighlight
Assembly: Lucene.Net.ICU.dll
Syntax
public class ICUPostingsHighlighter : object
Remarks
PostingsHighlighter treats the single original document as the whole corpus, and then scores individual
passages as if they were documents in this corpus. It uses a
You can customize the behavior by subclassing this highlighter, some important hooks:
- GetBreakIterator(String): Customize how the text is divided into passages.
- GetScorer(String): Customize how passages are ranked.
- GetFormatter(String): Customize how snippets are formatted.
- GetIndexAnalyzer(String): Enable highlighting of MultiTermQuerys such as WildcardQuery.
WARNING: The code is very new and probably still has some exciting bugs!
Example usage:
// configure field with offsets at index time
IndexableFieldType offsetsType = new IndexableFieldType(TextField.TYPE_STORED);
offsetsType.IndexOptions = IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS;
Field body = new Field("body", "foobar", offsetsType);
// retrieve highlights at query time
ICUPostingsHighlighter highlighter = new ICUPostingsHighlighter();
Query query = new TermQuery(new Term("body", "highlighting"));
TopDocs topDocs = searcher.Search(query, n);
string highlights[] = highlighter.Highlight("body", query, searcher, topDocs);
This is thread-safe, and can be used across different readers.
Note that the .NET implementation differs from the PostingsHighlighter
in Lucene in
that it is backed by an ICU
Constructors
| Improve this Doc View SourceICUPostingsHighlighter()
Creates a new highlighter with DEFAULT_MAX_LENGTH.
Declaration
public ICUPostingsHighlighter()
ICUPostingsHighlighter(Int32)
Creates a new highlighter, specifying maximum content length.
Declaration
public ICUPostingsHighlighter(int maxLength)
Parameters
Type | Name | Description |
---|---|---|
System.Int32 | maxLength | maximum content size to process. |
Fields
| Improve this Doc View SourceDEFAULT_MAX_LENGTH
Default maximum content size to process. Typically snippets closer to the beginning of the document better summarize its content
Declaration
public static readonly int DEFAULT_MAX_LENGTH
Field Value
Type | Description |
---|---|
System.Int32 |
Methods
| Improve this Doc View SourceGetBreakIterator(String)
Returns the
Declaration
protected virtual BreakIterator GetBreakIterator(string field)
Parameters
Type | Name | Description |
---|---|---|
System.String | field |
Returns
Type | Description |
---|---|
BreakIterator |
GetEmptyHighlight(String, BreakIterator, Int32)
Called to summarize a document when no hits were
found. By default this just returns the first
maxPassages
sentences; subclasses can override
to customize.
Declaration
protected virtual Passage[] GetEmptyHighlight(string fieldName, BreakIterator bi, int maxPassages)
Parameters
Type | Name | Description |
---|---|---|
System.String | fieldName | |
BreakIterator | bi | |
System.Int32 | maxPassages |
Returns
Type | Description |
---|---|
Passage[] |
GetFormatter(String)
Returns the PassageFormatter to use for formatting passages into highlighted snippets. This returns a new PassageFormatter by default; subclasses can override to customize.
Declaration
protected virtual PassageFormatter GetFormatter(string field)
Parameters
Type | Name | Description |
---|---|---|
System.String | field |
Returns
Type | Description |
---|---|
PassageFormatter |
GetIndexAnalyzer(String)
Returns the analyzer originally used to index the content for field
.
This is used to highlight some MultiTermQuerys.
Declaration
protected virtual Analyzer GetIndexAnalyzer(string field)
Parameters
Type | Name | Description |
---|---|---|
System.String | field |
Returns
Type | Description |
---|---|
Analyzer | Analyzer or null (the default, meaning no special multi-term processing) |
GetMultiValuedSeparator(String)
Returns the logical separator between values for multi-valued fields.
The default value is a space character, which means passages can span across values,
but a subclass can override, for example with U+2029 PARAGRAPH SEPARATOR (PS)
if each value holds a discrete passage for highlighting.
Declaration
protected virtual char GetMultiValuedSeparator(string field)
Parameters
Type | Name | Description |
---|---|---|
System.String | field |
Returns
Type | Description |
---|---|
System.Char |
GetScorer(String)
Returns the PassageScorer to use for ranking passages. This returns a new PassageScorer by default; subclasses can override to customize.
Declaration
protected virtual PassageScorer GetScorer(string field)
Parameters
Type | Name | Description |
---|---|---|
System.String | field |
Returns
Type | Description |
---|---|
PassageScorer |
Highlight(String, Query, IndexSearcher, TopDocs)
Highlights the top passages from a single field.
Declaration
public virtual string[] Highlight(string field, Query query, IndexSearcher searcher, TopDocs topDocs)
Parameters
Type | Name | Description |
---|---|---|
System.String | field | field name to highlight. Must have a stored string value and also be indexed with offsets. |
Query | query | query to highlight. |
IndexSearcher | searcher | searcher that was previously used to execute the query. |
TopDocs | topDocs | TopDocs containing the summary result documents to highlight. |
Returns
Type | Description |
---|---|
System.String[] | Array of formatted snippets corresponding to the documents in |
Highlight(String, Query, IndexSearcher, TopDocs, Int32)
Highlights the top-N passages from a single field.
Declaration
public virtual string[] Highlight(string field, Query query, IndexSearcher searcher, TopDocs topDocs, int maxPassages)
Parameters
Type | Name | Description |
---|---|---|
System.String | field | field name to highlight. Must have a stored string value and also be indexed with offsets. |
Query | query | query to highlight. |
IndexSearcher | searcher | searcher that was previously used to execute the query. |
TopDocs | topDocs | TopDocs containing the summary result documents to highlight. |
System.Int32 | maxPassages | The maximum number of top-N ranked passages used to form the highlighted snippets. |
Returns
Type | Description |
---|---|
System.String[] | Array of formatted snippets corresponding to the documents in |
HighlightFields(String[], Query, IndexSearcher, TopDocs)
Highlights the top passages from multiple fields.
Conceptually, this behaves as a more efficient form of:
IDictionary<string, string[]> m = new Dictionary<string, string[]>();
foreach (string field in fields)
{
m[field] = Highlight(field, query, searcher, topDocs);
}
return m;
Declaration
public virtual IDictionary<string, string[]> HighlightFields(string[] fields, Query query, IndexSearcher searcher, TopDocs topDocs)
Parameters
Type | Name | Description |
---|---|---|
System.String[] | fields | field names to highlight. Must have a stored string value and also be indexed with offsets. |
Query | query | query to highlight. |
IndexSearcher | searcher | searcher that was previously used to execute the query. |
TopDocs | topDocs | TopDocs containing the summary result documents to highlight. |
Returns
Type | Description |
---|---|
IDictionary<System.String, System.String[]> |
|
HighlightFields(String[], Query, IndexSearcher, TopDocs, Int32[])
Highlights the top-N passages from multiple fields.
Conceptually, this behaves as a more efficient form of:
IDictionary<string, string[]> m = new Dictionary<string, string[]>();
foreach (string field in fields)
{
m[field] = Highlight(field, query, searcher, topDocs, maxPassages);
}
return m;
Declaration
public virtual IDictionary<string, string[]> HighlightFields(string[] fields, Query query, IndexSearcher searcher, TopDocs topDocs, int[] maxPassages)
Parameters
Type | Name | Description |
---|---|---|
System.String[] | fields | field names to highlight. Must have a stored string value and also be indexed with offsets. |
Query | query | query to highlight. |
IndexSearcher | searcher | searcher that was previously used to execute the query. |
TopDocs | topDocs | TopDocs containing the summary result documents to highlight. |
System.Int32[] | maxPassages | The maximum number of top-N ranked passages per-field used to form the highlighted snippets. |
Returns
Type | Description |
---|---|
IDictionary<System.String, System.String[]> |
|
HighlightFields(String[], Query, IndexSearcher, Int32[], Int32[])
Highlights the top-N passages from multiple fields, for the provided int[] docids.
Declaration
public virtual IDictionary<string, string[]> HighlightFields(string[] fieldsIn, Query query, IndexSearcher searcher, int[] docidsIn, int[] maxPassagesIn)
Parameters
Type | Name | Description |
---|---|---|
System.String[] | fieldsIn | field names to highlight. Must have a stored string value and also be indexed with offsets. |
Query | query | query to highlight. |
IndexSearcher | searcher | searcher that was previously used to execute the query. |
System.Int32[] | docidsIn | containing the document IDs to highlight. |
System.Int32[] | maxPassagesIn | The maximum number of top-N ranked passages per-field used to form the highlighted snippets. |
Returns
Type | Description |
---|---|
IDictionary<System.String, System.String[]> |
|
HighlightFieldsAsObjects(String[], Query, IndexSearcher, Int32[], Int32[])
Expert: highlights the top-N passages from multiple fields,
for the provided int[] docids, to custom object as
returned by the PassageFormatter. Use
this API to render to something other than
Declaration
protected virtual IDictionary<string, object[]> HighlightFieldsAsObjects(string[] fieldsIn, Query query, IndexSearcher searcher, int[] docidsIn, int[] maxPassagesIn)
Parameters
Type | Name | Description |
---|---|---|
System.String[] | fieldsIn | field names to highlight. Must have a stored string value and also be indexed with offsets. |
Query | query | query to highlight. |
IndexSearcher | searcher | searcher that was previously used to execute the query. |
System.Int32[] | docidsIn | containing the document IDs to highlight. |
System.Int32[] | maxPassagesIn | The maximum number of top-N ranked passages per-field used to form the highlighted snippets. |
Returns
Type | Description |
---|---|
IDictionary<System.String, System.Object[]> |
|
LoadFieldValues(IndexSearcher, String[], Int32[], Int32)
Loads the string values for each field X docID to be highlighted. By default this loads from stored fields, but a subclass can change the source. This method should allocate the string[fields.length][docids.length] and fill all values. The returned strings must be identical to what was indexed.
Declaration
protected virtual IList<string[]> LoadFieldValues(IndexSearcher searcher, string[] fields, int[] docids, int maxLength)
Parameters
Type | Name | Description |
---|---|---|
IndexSearcher | searcher | |
System.String[] | fields | |
System.Int32[] | docids | |
System.Int32 | maxLength |
Returns
Type | Description |
---|---|
IList<System.String[]> |