Show / Hide Table of Contents

    Class ICUPostingsHighlighter

    Simple highlighter that does not analyze fields nor use term vectors. Instead it requires DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS.

    Inheritance
    System.Object
    ICUPostingsHighlighter
    Namespace: Lucene.Net.Search.PostingsHighlight
    Assembly: Lucene.Net.ICU.dll
    Syntax
    public class ICUPostingsHighlighter : object
    Remarks

    PostingsHighlighter treats the single original document as the whole corpus, and then scores individual passages as if they were documents in this corpus. It uses a to find passages in the text; by default it breaks using (for sentence breaking). It then iterates in parallel (merge sorting by offset) through the positions of all terms from the query, coalescing those hits that occur in a single passage into a Passage, and then scores each Passage using a separate PassageScorer. Passages are finally formatted into highlighted snippets with a PassageFormatter.

    You can customize the behavior by subclassing this highlighter, some important hooks:

    • GetBreakIterator(String): Customize how the text is divided into passages.
    • GetScorer(String): Customize how passages are ranked.
    • GetFormatter(String): Customize how snippets are formatted.
    • GetIndexAnalyzer(String): Enable highlighting of MultiTermQuerys such as WildcardQuery.

    WARNING: The code is very new and probably still has some exciting bugs!

    Example usage:

        // configure field with offsets at index time
        IndexableFieldType offsetsType = new IndexableFieldType(TextField.TYPE_STORED);
        offsetsType.IndexOptions = IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS;
        Field body = new Field("body", "foobar", offsetsType);
    
        // retrieve highlights at query time 
        ICUPostingsHighlighter highlighter = new ICUPostingsHighlighter();
        Query query = new TermQuery(new Term("body", "highlighting"));
        TopDocs topDocs = searcher.Search(query, n);
        string highlights[] = highlighter.Highlight("body", query, searcher, topDocs);

    This is thread-safe, and can be used across different readers.

    Note that the .NET implementation differs from the PostingsHighlighter in Lucene in that it is backed by an ICU , which differs slightly in default behavior than the one in the JDK. However, the ICU behavior can be customized to meet a lot of scenarios that the one in the JDK cannot. See the ICU documentation at http://userguide.icu-project.org/boundaryanalysis/break-rules for more information how to pass custom rules to an ICU .

    This is a Lucene.NET EXPERIMENTAL API, use at your own risk

    Constructors

    | Improve this Doc View Source

    ICUPostingsHighlighter()

    Creates a new highlighter with DEFAULT_MAX_LENGTH.

    Declaration
    public ICUPostingsHighlighter()
    | Improve this Doc View Source

    ICUPostingsHighlighter(Int32)

    Creates a new highlighter, specifying maximum content length.

    Declaration
    public ICUPostingsHighlighter(int maxLength)
    Parameters
    Type Name Description
    System.Int32 maxLength

    maximum content size to process.

    Fields

    | Improve this Doc View Source

    DEFAULT_MAX_LENGTH

    Default maximum content size to process. Typically snippets closer to the beginning of the document better summarize its content

    Declaration
    public static readonly int DEFAULT_MAX_LENGTH
    Field Value
    Type Description
    System.Int32

    Methods

    | Improve this Doc View Source

    GetBreakIterator(String)

    Returns the to use for dividing text into passages. This instantiates an by default; subclasses can override to customize.

    Declaration
    protected virtual BreakIterator GetBreakIterator(string field)
    Parameters
    Type Name Description
    System.String field
    Returns
    Type Description
    BreakIterator
    | Improve this Doc View Source

    GetEmptyHighlight(String, BreakIterator, Int32)

    Called to summarize a document when no hits were found. By default this just returns the first maxPassages sentences; subclasses can override to customize.

    Declaration
    protected virtual Passage[] GetEmptyHighlight(string fieldName, BreakIterator bi, int maxPassages)
    Parameters
    Type Name Description
    System.String fieldName
    BreakIterator bi
    System.Int32 maxPassages
    Returns
    Type Description
    Passage[]
    | Improve this Doc View Source

    GetFormatter(String)

    Returns the PassageFormatter to use for formatting passages into highlighted snippets. This returns a new PassageFormatter by default; subclasses can override to customize.

    Declaration
    protected virtual PassageFormatter GetFormatter(string field)
    Parameters
    Type Name Description
    System.String field
    Returns
    Type Description
    PassageFormatter
    | Improve this Doc View Source

    GetIndexAnalyzer(String)

    Returns the analyzer originally used to index the content for field.

    This is used to highlight some MultiTermQuerys.

    Declaration
    protected virtual Analyzer GetIndexAnalyzer(string field)
    Parameters
    Type Name Description
    System.String field
    Returns
    Type Description
    Analyzer

    Analyzer or null (the default, meaning no special multi-term processing)

    | Improve this Doc View Source

    GetMultiValuedSeparator(String)

    Returns the logical separator between values for multi-valued fields. The default value is a space character, which means passages can span across values, but a subclass can override, for example with U+2029 PARAGRAPH SEPARATOR (PS) if each value holds a discrete passage for highlighting.

    Declaration
    protected virtual char GetMultiValuedSeparator(string field)
    Parameters
    Type Name Description
    System.String field
    Returns
    Type Description
    System.Char
    | Improve this Doc View Source

    GetScorer(String)

    Returns the PassageScorer to use for ranking passages. This returns a new PassageScorer by default; subclasses can override to customize.

    Declaration
    protected virtual PassageScorer GetScorer(string field)
    Parameters
    Type Name Description
    System.String field
    Returns
    Type Description
    PassageScorer
    | Improve this Doc View Source

    Highlight(String, Query, IndexSearcher, TopDocs)

    Highlights the top passages from a single field.

    Declaration
    public virtual string[] Highlight(string field, Query query, IndexSearcher searcher, TopDocs topDocs)
    Parameters
    Type Name Description
    System.String field

    field name to highlight. Must have a stored string value and also be indexed with offsets.

    Query query

    query to highlight.

    IndexSearcher searcher

    searcher that was previously used to execute the query.

    TopDocs topDocs

    TopDocs containing the summary result documents to highlight.

    Returns
    Type Description
    System.String[]

    Array of formatted snippets corresponding to the documents in topDocs. If no highlights were found for a document, the first sentence for the field will be returned.

    | Improve this Doc View Source

    Highlight(String, Query, IndexSearcher, TopDocs, Int32)

    Highlights the top-N passages from a single field.

    Declaration
    public virtual string[] Highlight(string field, Query query, IndexSearcher searcher, TopDocs topDocs, int maxPassages)
    Parameters
    Type Name Description
    System.String field

    field name to highlight. Must have a stored string value and also be indexed with offsets.

    Query query

    query to highlight.

    IndexSearcher searcher

    searcher that was previously used to execute the query.

    TopDocs topDocs

    TopDocs containing the summary result documents to highlight.

    System.Int32 maxPassages

    The maximum number of top-N ranked passages used to form the highlighted snippets.

    Returns
    Type Description
    System.String[]

    Array of formatted snippets corresponding to the documents in topDocs. If no highlights were found for a document, the first maxPassages sentences from the field will be returned.

    | Improve this Doc View Source

    HighlightFields(String[], Query, IndexSearcher, TopDocs)

    Highlights the top passages from multiple fields.

    Conceptually, this behaves as a more efficient form of:

    IDictionary<string, string[]> m = new Dictionary<string, string[]>();
    foreach (string field in fields)
    {
        m[field] = Highlight(field, query, searcher, topDocs);
    }
    return m;
    Declaration
    public virtual IDictionary<string, string[]> HighlightFields(string[] fields, Query query, IndexSearcher searcher, TopDocs topDocs)
    Parameters
    Type Name Description
    System.String[] fields

    field names to highlight. Must have a stored string value and also be indexed with offsets.

    Query query

    query to highlight.

    IndexSearcher searcher

    searcher that was previously used to execute the query.

    TopDocs topDocs

    TopDocs containing the summary result documents to highlight.

    Returns
    Type Description
    IDictionary<System.String, System.String[]>

    keyed on field name, containing the array of formatted snippets corresponding to the documents in topDocs. If no highlights were found for a document, the first sentence from the field will be returned.

    | Improve this Doc View Source

    HighlightFields(String[], Query, IndexSearcher, TopDocs, Int32[])

    Highlights the top-N passages from multiple fields.

    Conceptually, this behaves as a more efficient form of:

    IDictionary<string, string[]> m = new Dictionary<string, string[]>();
    foreach (string field in fields)
    {
        m[field] = Highlight(field, query, searcher, topDocs, maxPassages);
    }
    return m;
    Declaration
    public virtual IDictionary<string, string[]> HighlightFields(string[] fields, Query query, IndexSearcher searcher, TopDocs topDocs, int[] maxPassages)
    Parameters
    Type Name Description
    System.String[] fields

    field names to highlight. Must have a stored string value and also be indexed with offsets.

    Query query

    query to highlight.

    IndexSearcher searcher

    searcher that was previously used to execute the query.

    TopDocs topDocs

    TopDocs containing the summary result documents to highlight.

    System.Int32[] maxPassages

    The maximum number of top-N ranked passages per-field used to form the highlighted snippets.

    Returns
    Type Description
    IDictionary<System.String, System.String[]>

    keyed on field name, containing the array of formatted snippets corresponding to the documents in topDocs. If no highlights were found for a document, the first maxPassages sentences from the field will be returned.

    | Improve this Doc View Source

    HighlightFields(String[], Query, IndexSearcher, Int32[], Int32[])

    Highlights the top-N passages from multiple fields, for the provided int[] docids.

    Declaration
    public virtual IDictionary<string, string[]> HighlightFields(string[] fieldsIn, Query query, IndexSearcher searcher, int[] docidsIn, int[] maxPassagesIn)
    Parameters
    Type Name Description
    System.String[] fieldsIn

    field names to highlight. Must have a stored string value and also be indexed with offsets.

    Query query

    query to highlight.

    IndexSearcher searcher

    searcher that was previously used to execute the query.

    System.Int32[] docidsIn

    containing the document IDs to highlight.

    System.Int32[] maxPassagesIn

    The maximum number of top-N ranked passages per-field used to form the highlighted snippets.

    Returns
    Type Description
    IDictionary<System.String, System.String[]>

    keyed on field name, containing the array of formatted snippets corresponding to the documents in docidsIn. If no highlights were found for a document, the first maxPassages from the field will be returned.

    | Improve this Doc View Source

    HighlightFieldsAsObjects(String[], Query, IndexSearcher, Int32[], Int32[])

    Expert: highlights the top-N passages from multiple fields, for the provided int[] docids, to custom object as returned by the PassageFormatter. Use this API to render to something other than .

    Declaration
    protected virtual IDictionary<string, object[]> HighlightFieldsAsObjects(string[] fieldsIn, Query query, IndexSearcher searcher, int[] docidsIn, int[] maxPassagesIn)
    Parameters
    Type Name Description
    System.String[] fieldsIn

    field names to highlight. Must have a stored string value and also be indexed with offsets.

    Query query

    query to highlight.

    IndexSearcher searcher

    searcher that was previously used to execute the query.

    System.Int32[] docidsIn

    containing the document IDs to highlight.

    System.Int32[] maxPassagesIn

    The maximum number of top-N ranked passages per-field used to form the highlighted snippets.

    Returns
    Type Description
    IDictionary<System.String, System.Object[]>

    keyed on field name, containing the array of formatted snippets corresponding to the documents in docidsIn. If no highlights were found for a document, the first maxPassagesIn from the field will be returned.

    | Improve this Doc View Source

    LoadFieldValues(IndexSearcher, String[], Int32[], Int32)

    Loads the string values for each field X docID to be highlighted. By default this loads from stored fields, but a subclass can change the source. This method should allocate the string[fields.length][docids.length] and fill all values. The returned strings must be identical to what was indexed.

    Declaration
    protected virtual IList<string[]> LoadFieldValues(IndexSearcher searcher, string[] fields, int[] docids, int maxLength)
    Parameters
    Type Name Description
    IndexSearcher searcher
    System.String[] fields
    System.Int32[] docids
    System.Int32 maxLength
    Returns
    Type Description
    IList<System.String[]>
    • Improve this Doc
    • View Source
    Back to top Copyright © 2020 Licensed to the Apache Software Foundation (ASF)