Fork me on GitHub
  • API

    Show / Hide Table of Contents

    Class ICUPostingsHighlighter

    Simple highlighter that does not analyze fields nor use term vectors. Instead it requires Lucene.Net.Index.IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS.

    Inheritance
    object
    ICUPostingsHighlighter
    Inherited Members
    object.Equals(object)
    object.Equals(object, object)
    object.GetHashCode()
    object.GetType()
    object.MemberwiseClone()
    object.ReferenceEquals(object, object)
    object.ToString()
    Namespace: Lucene.Net.Search.PostingsHighlight
    Assembly: Lucene.Net.ICU.dll
    Syntax
    public class ICUPostingsHighlighter
    Remarks

    PostingsHighlighter treats the single original document as the whole corpus, and then scores individual passages as if they were documents in this corpus. It uses a ICU4N.Text.BreakIterator to find passages in the text; by default it breaks using GetSentenceInstance(CultureInfo) (for sentence breaking). It then iterates in parallel (merge sorting by offset) through the positions of all terms from the query, coalescing those hits that occur in a single passage into a Passage, and then scores each Passage using a separate PassageScorer. Passages are finally formatted into highlighted snippets with a PassageFormatter.

    You can customize the behavior by subclassing this highlighter, some important hooks:
    • GetBreakIterator(string): Customize how the text is divided into passages.
    • GetScorer(string): Customize how passages are ranked.
    • GetFormatter(string): Customize how snippets are formatted.
    • GetIndexAnalyzer(string): Enable highlighting of MultiTermQuerys such as Lucene.Net.Search.WildcardQuery.

    WARNING: The code is very new and probably still has some exciting bugs!

    Example usage:
    // configure field with offsets at index time
    IndexableFieldType offsetsType = new IndexableFieldType(TextField.TYPE_STORED);
    offsetsType.IndexOptions = IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS;
    Field body = new Field("body", "foobar", offsetsType);
    

    // retrieve highlights at query time ICUPostingsHighlighter highlighter = new ICUPostingsHighlighter(); Query query = new TermQuery(new Term("body", "highlighting")); TopDocs topDocs = searcher.Search(query, n); string highlights[] = highlighter.Highlight("body", query, searcher, topDocs);

    This is thread-safe, and can be used across different readers.

    Note that the .NET implementation differs from the PostingsHighlighter in Lucene in that it is backed by an ICU ICU4N.Text.RuleBasedBreakIterator, which differs slightly in default behavior than the one in the JDK. However, the ICU ICU4N.Text.RuleBasedBreakIterator behavior can be customized to meet a lot of scenarios that the one in the JDK cannot. See the ICU documentation at http://userguide.icu-project.org/boundaryanalysis/break-rules for more information how to pass custom rules to an ICU ICU4N.Text.RuleBasedBreakIterator.

    Note

    This API is experimental and might change in incompatible ways in the next release.

    Constructors

    ICUPostingsHighlighter()

    Creates a new highlighter with DEFAULT_MAX_LENGTH.

    Declaration
    public ICUPostingsHighlighter()
    Remarks

    PostingsHighlighter treats the single original document as the whole corpus, and then scores individual passages as if they were documents in this corpus. It uses a ICU4N.Text.BreakIterator to find passages in the text; by default it breaks using GetSentenceInstance(CultureInfo) (for sentence breaking). It then iterates in parallel (merge sorting by offset) through the positions of all terms from the query, coalescing those hits that occur in a single passage into a Passage, and then scores each Passage using a separate PassageScorer. Passages are finally formatted into highlighted snippets with a PassageFormatter.

    You can customize the behavior by subclassing this highlighter, some important hooks:
    • GetBreakIterator(string): Customize how the text is divided into passages.
    • GetScorer(string): Customize how passages are ranked.
    • GetFormatter(string): Customize how snippets are formatted.
    • GetIndexAnalyzer(string): Enable highlighting of MultiTermQuerys such as Lucene.Net.Search.WildcardQuery.

    WARNING: The code is very new and probably still has some exciting bugs!

    Example usage:
    // configure field with offsets at index time
    IndexableFieldType offsetsType = new IndexableFieldType(TextField.TYPE_STORED);
    offsetsType.IndexOptions = IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS;
    Field body = new Field("body", "foobar", offsetsType);
    

    // retrieve highlights at query time ICUPostingsHighlighter highlighter = new ICUPostingsHighlighter(); Query query = new TermQuery(new Term("body", "highlighting")); TopDocs topDocs = searcher.Search(query, n); string highlights[] = highlighter.Highlight("body", query, searcher, topDocs);

    This is thread-safe, and can be used across different readers.

    Note that the .NET implementation differs from the PostingsHighlighter in Lucene in that it is backed by an ICU ICU4N.Text.RuleBasedBreakIterator, which differs slightly in default behavior than the one in the JDK. However, the ICU ICU4N.Text.RuleBasedBreakIterator behavior can be customized to meet a lot of scenarios that the one in the JDK cannot. See the ICU documentation at http://userguide.icu-project.org/boundaryanalysis/break-rules for more information how to pass custom rules to an ICU ICU4N.Text.RuleBasedBreakIterator.

    Note

    This API is experimental and might change in incompatible ways in the next release.

    ICUPostingsHighlighter(int)

    Creates a new highlighter, specifying maximum content length.

    Declaration
    public ICUPostingsHighlighter(int maxLength)
    Parameters
    Type Name Description
    int maxLength

    maximum content size to process.

    Remarks

    PostingsHighlighter treats the single original document as the whole corpus, and then scores individual passages as if they were documents in this corpus. It uses a ICU4N.Text.BreakIterator to find passages in the text; by default it breaks using GetSentenceInstance(CultureInfo) (for sentence breaking). It then iterates in parallel (merge sorting by offset) through the positions of all terms from the query, coalescing those hits that occur in a single passage into a Passage, and then scores each Passage using a separate PassageScorer. Passages are finally formatted into highlighted snippets with a PassageFormatter.

    You can customize the behavior by subclassing this highlighter, some important hooks:
    • GetBreakIterator(string): Customize how the text is divided into passages.
    • GetScorer(string): Customize how passages are ranked.
    • GetFormatter(string): Customize how snippets are formatted.
    • GetIndexAnalyzer(string): Enable highlighting of MultiTermQuerys such as Lucene.Net.Search.WildcardQuery.

    WARNING: The code is very new and probably still has some exciting bugs!

    Example usage:
    // configure field with offsets at index time
    IndexableFieldType offsetsType = new IndexableFieldType(TextField.TYPE_STORED);
    offsetsType.IndexOptions = IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS;
    Field body = new Field("body", "foobar", offsetsType);
    

    // retrieve highlights at query time ICUPostingsHighlighter highlighter = new ICUPostingsHighlighter(); Query query = new TermQuery(new Term("body", "highlighting")); TopDocs topDocs = searcher.Search(query, n); string highlights[] = highlighter.Highlight("body", query, searcher, topDocs);

    This is thread-safe, and can be used across different readers.

    Note that the .NET implementation differs from the PostingsHighlighter in Lucene in that it is backed by an ICU ICU4N.Text.RuleBasedBreakIterator, which differs slightly in default behavior than the one in the JDK. However, the ICU ICU4N.Text.RuleBasedBreakIterator behavior can be customized to meet a lot of scenarios that the one in the JDK cannot. See the ICU documentation at http://userguide.icu-project.org/boundaryanalysis/break-rules for more information how to pass custom rules to an ICU ICU4N.Text.RuleBasedBreakIterator.

    Note

    This API is experimental and might change in incompatible ways in the next release.

    Exceptions
    Type Condition
    ArgumentException

    if maxLength is negative or int.MaxValue

    Fields

    DEFAULT_MAX_LENGTH

    Default maximum content size to process. Typically snippets closer to the beginning of the document better summarize its content

    Declaration
    public static readonly int DEFAULT_MAX_LENGTH
    Field Value
    Type Description
    int
    Remarks

    PostingsHighlighter treats the single original document as the whole corpus, and then scores individual passages as if they were documents in this corpus. It uses a ICU4N.Text.BreakIterator to find passages in the text; by default it breaks using GetSentenceInstance(CultureInfo) (for sentence breaking). It then iterates in parallel (merge sorting by offset) through the positions of all terms from the query, coalescing those hits that occur in a single passage into a Passage, and then scores each Passage using a separate PassageScorer. Passages are finally formatted into highlighted snippets with a PassageFormatter.

    You can customize the behavior by subclassing this highlighter, some important hooks:
    • GetBreakIterator(string): Customize how the text is divided into passages.
    • GetScorer(string): Customize how passages are ranked.
    • GetFormatter(string): Customize how snippets are formatted.
    • GetIndexAnalyzer(string): Enable highlighting of MultiTermQuerys such as Lucene.Net.Search.WildcardQuery.

    WARNING: The code is very new and probably still has some exciting bugs!

    Example usage:
    // configure field with offsets at index time
    IndexableFieldType offsetsType = new IndexableFieldType(TextField.TYPE_STORED);
    offsetsType.IndexOptions = IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS;
    Field body = new Field("body", "foobar", offsetsType);
    

    // retrieve highlights at query time ICUPostingsHighlighter highlighter = new ICUPostingsHighlighter(); Query query = new TermQuery(new Term("body", "highlighting")); TopDocs topDocs = searcher.Search(query, n); string highlights[] = highlighter.Highlight("body", query, searcher, topDocs);

    This is thread-safe, and can be used across different readers.

    Note that the .NET implementation differs from the PostingsHighlighter in Lucene in that it is backed by an ICU ICU4N.Text.RuleBasedBreakIterator, which differs slightly in default behavior than the one in the JDK. However, the ICU ICU4N.Text.RuleBasedBreakIterator behavior can be customized to meet a lot of scenarios that the one in the JDK cannot. See the ICU documentation at http://userguide.icu-project.org/boundaryanalysis/break-rules for more information how to pass custom rules to an ICU ICU4N.Text.RuleBasedBreakIterator.

    Note

    This API is experimental and might change in incompatible ways in the next release.

    Methods

    GetBreakIterator(string)

    Returns the ICU4N.Text.BreakIterator to use for dividing text into passages. This instantiates an GetSentenceInstance(CultureInfo) by default; subclasses can override to customize.

    Declaration
    protected virtual BreakIterator GetBreakIterator(string field)
    Parameters
    Type Name Description
    string field
    Returns
    Type Description
    BreakIterator
    Remarks

    PostingsHighlighter treats the single original document as the whole corpus, and then scores individual passages as if they were documents in this corpus. It uses a ICU4N.Text.BreakIterator to find passages in the text; by default it breaks using GetSentenceInstance(CultureInfo) (for sentence breaking). It then iterates in parallel (merge sorting by offset) through the positions of all terms from the query, coalescing those hits that occur in a single passage into a Passage, and then scores each Passage using a separate PassageScorer. Passages are finally formatted into highlighted snippets with a PassageFormatter.

    You can customize the behavior by subclassing this highlighter, some important hooks:
    • GetBreakIterator(string): Customize how the text is divided into passages.
    • GetScorer(string): Customize how passages are ranked.
    • GetFormatter(string): Customize how snippets are formatted.
    • GetIndexAnalyzer(string): Enable highlighting of MultiTermQuerys such as Lucene.Net.Search.WildcardQuery.

    WARNING: The code is very new and probably still has some exciting bugs!

    Example usage:
    // configure field with offsets at index time
    IndexableFieldType offsetsType = new IndexableFieldType(TextField.TYPE_STORED);
    offsetsType.IndexOptions = IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS;
    Field body = new Field("body", "foobar", offsetsType);
    

    // retrieve highlights at query time ICUPostingsHighlighter highlighter = new ICUPostingsHighlighter(); Query query = new TermQuery(new Term("body", "highlighting")); TopDocs topDocs = searcher.Search(query, n); string highlights[] = highlighter.Highlight("body", query, searcher, topDocs);

    This is thread-safe, and can be used across different readers.

    Note that the .NET implementation differs from the PostingsHighlighter in Lucene in that it is backed by an ICU ICU4N.Text.RuleBasedBreakIterator, which differs slightly in default behavior than the one in the JDK. However, the ICU ICU4N.Text.RuleBasedBreakIterator behavior can be customized to meet a lot of scenarios that the one in the JDK cannot. See the ICU documentation at http://userguide.icu-project.org/boundaryanalysis/break-rules for more information how to pass custom rules to an ICU ICU4N.Text.RuleBasedBreakIterator.

    Note

    This API is experimental and might change in incompatible ways in the next release.

    GetEmptyHighlight(string, BreakIterator, int)

    Called to summarize a document when no hits were found. By default this just returns the first maxPassages sentences; subclasses can override to customize.

    Declaration
    protected virtual Passage[] GetEmptyHighlight(string fieldName, BreakIterator bi, int maxPassages)
    Parameters
    Type Name Description
    string fieldName
    BreakIterator bi
    int maxPassages
    Returns
    Type Description
    Passage[]
    Remarks

    PostingsHighlighter treats the single original document as the whole corpus, and then scores individual passages as if they were documents in this corpus. It uses a ICU4N.Text.BreakIterator to find passages in the text; by default it breaks using GetSentenceInstance(CultureInfo) (for sentence breaking). It then iterates in parallel (merge sorting by offset) through the positions of all terms from the query, coalescing those hits that occur in a single passage into a Passage, and then scores each Passage using a separate PassageScorer. Passages are finally formatted into highlighted snippets with a PassageFormatter.

    You can customize the behavior by subclassing this highlighter, some important hooks:
    • GetBreakIterator(string): Customize how the text is divided into passages.
    • GetScorer(string): Customize how passages are ranked.
    • GetFormatter(string): Customize how snippets are formatted.
    • GetIndexAnalyzer(string): Enable highlighting of MultiTermQuerys such as Lucene.Net.Search.WildcardQuery.

    WARNING: The code is very new and probably still has some exciting bugs!

    Example usage:
    // configure field with offsets at index time
    IndexableFieldType offsetsType = new IndexableFieldType(TextField.TYPE_STORED);
    offsetsType.IndexOptions = IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS;
    Field body = new Field("body", "foobar", offsetsType);
    

    // retrieve highlights at query time ICUPostingsHighlighter highlighter = new ICUPostingsHighlighter(); Query query = new TermQuery(new Term("body", "highlighting")); TopDocs topDocs = searcher.Search(query, n); string highlights[] = highlighter.Highlight("body", query, searcher, topDocs);

    This is thread-safe, and can be used across different readers.

    Note that the .NET implementation differs from the PostingsHighlighter in Lucene in that it is backed by an ICU ICU4N.Text.RuleBasedBreakIterator, which differs slightly in default behavior than the one in the JDK. However, the ICU ICU4N.Text.RuleBasedBreakIterator behavior can be customized to meet a lot of scenarios that the one in the JDK cannot. See the ICU documentation at http://userguide.icu-project.org/boundaryanalysis/break-rules for more information how to pass custom rules to an ICU ICU4N.Text.RuleBasedBreakIterator.

    Note

    This API is experimental and might change in incompatible ways in the next release.

    GetFormatter(string)

    Returns the PassageFormatter to use for formatting passages into highlighted snippets. This returns a new PassageFormatter by default; subclasses can override to customize.

    Declaration
    protected virtual PassageFormatter GetFormatter(string field)
    Parameters
    Type Name Description
    string field
    Returns
    Type Description
    PassageFormatter
    Remarks

    PostingsHighlighter treats the single original document as the whole corpus, and then scores individual passages as if they were documents in this corpus. It uses a ICU4N.Text.BreakIterator to find passages in the text; by default it breaks using GetSentenceInstance(CultureInfo) (for sentence breaking). It then iterates in parallel (merge sorting by offset) through the positions of all terms from the query, coalescing those hits that occur in a single passage into a Passage, and then scores each Passage using a separate PassageScorer. Passages are finally formatted into highlighted snippets with a PassageFormatter.

    You can customize the behavior by subclassing this highlighter, some important hooks:
    • GetBreakIterator(string): Customize how the text is divided into passages.
    • GetScorer(string): Customize how passages are ranked.
    • GetFormatter(string): Customize how snippets are formatted.
    • GetIndexAnalyzer(string): Enable highlighting of MultiTermQuerys such as Lucene.Net.Search.WildcardQuery.

    WARNING: The code is very new and probably still has some exciting bugs!

    Example usage:
    // configure field with offsets at index time
    IndexableFieldType offsetsType = new IndexableFieldType(TextField.TYPE_STORED);
    offsetsType.IndexOptions = IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS;
    Field body = new Field("body", "foobar", offsetsType);
    

    // retrieve highlights at query time ICUPostingsHighlighter highlighter = new ICUPostingsHighlighter(); Query query = new TermQuery(new Term("body", "highlighting")); TopDocs topDocs = searcher.Search(query, n); string highlights[] = highlighter.Highlight("body", query, searcher, topDocs);

    This is thread-safe, and can be used across different readers.

    Note that the .NET implementation differs from the PostingsHighlighter in Lucene in that it is backed by an ICU ICU4N.Text.RuleBasedBreakIterator, which differs slightly in default behavior than the one in the JDK. However, the ICU ICU4N.Text.RuleBasedBreakIterator behavior can be customized to meet a lot of scenarios that the one in the JDK cannot. See the ICU documentation at http://userguide.icu-project.org/boundaryanalysis/break-rules for more information how to pass custom rules to an ICU ICU4N.Text.RuleBasedBreakIterator.

    Note

    This API is experimental and might change in incompatible ways in the next release.

    GetIndexAnalyzer(string)

    Returns the analyzer originally used to index the content for field.

    This is used to highlight some Lucene.Net.Search.MultiTermQuerys.
    Declaration
    protected virtual Analyzer GetIndexAnalyzer(string field)
    Parameters
    Type Name Description
    string field
    Returns
    Type Description
    Analyzer

    Lucene.Net.Analysis.Analyzer or null (the default, meaning no special multi-term processing)

    Remarks

    PostingsHighlighter treats the single original document as the whole corpus, and then scores individual passages as if they were documents in this corpus. It uses a ICU4N.Text.BreakIterator to find passages in the text; by default it breaks using GetSentenceInstance(CultureInfo) (for sentence breaking). It then iterates in parallel (merge sorting by offset) through the positions of all terms from the query, coalescing those hits that occur in a single passage into a Passage, and then scores each Passage using a separate PassageScorer. Passages are finally formatted into highlighted snippets with a PassageFormatter.

    You can customize the behavior by subclassing this highlighter, some important hooks:
    • GetBreakIterator(string): Customize how the text is divided into passages.
    • GetScorer(string): Customize how passages are ranked.
    • GetFormatter(string): Customize how snippets are formatted.
    • GetIndexAnalyzer(string): Enable highlighting of MultiTermQuerys such as Lucene.Net.Search.WildcardQuery.

    WARNING: The code is very new and probably still has some exciting bugs!

    Example usage:
    // configure field with offsets at index time
    IndexableFieldType offsetsType = new IndexableFieldType(TextField.TYPE_STORED);
    offsetsType.IndexOptions = IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS;
    Field body = new Field("body", "foobar", offsetsType);
    

    // retrieve highlights at query time ICUPostingsHighlighter highlighter = new ICUPostingsHighlighter(); Query query = new TermQuery(new Term("body", "highlighting")); TopDocs topDocs = searcher.Search(query, n); string highlights[] = highlighter.Highlight("body", query, searcher, topDocs);

    This is thread-safe, and can be used across different readers.

    Note that the .NET implementation differs from the PostingsHighlighter in Lucene in that it is backed by an ICU ICU4N.Text.RuleBasedBreakIterator, which differs slightly in default behavior than the one in the JDK. However, the ICU ICU4N.Text.RuleBasedBreakIterator behavior can be customized to meet a lot of scenarios that the one in the JDK cannot. See the ICU documentation at http://userguide.icu-project.org/boundaryanalysis/break-rules for more information how to pass custom rules to an ICU ICU4N.Text.RuleBasedBreakIterator.

    Note

    This API is experimental and might change in incompatible ways in the next release.

    GetMultiValuedSeparator(string)

    Returns the logical separator between values for multi-valued fields. The default value is a space character, which means passages can span across values, but a subclass can override, for example with U+2029 PARAGRAPH SEPARATOR (PS) if each value holds a discrete passage for highlighting.

    Declaration
    protected virtual char GetMultiValuedSeparator(string field)
    Parameters
    Type Name Description
    string field
    Returns
    Type Description
    char
    Remarks

    PostingsHighlighter treats the single original document as the whole corpus, and then scores individual passages as if they were documents in this corpus. It uses a ICU4N.Text.BreakIterator to find passages in the text; by default it breaks using GetSentenceInstance(CultureInfo) (for sentence breaking). It then iterates in parallel (merge sorting by offset) through the positions of all terms from the query, coalescing those hits that occur in a single passage into a Passage, and then scores each Passage using a separate PassageScorer. Passages are finally formatted into highlighted snippets with a PassageFormatter.

    You can customize the behavior by subclassing this highlighter, some important hooks:
    • GetBreakIterator(string): Customize how the text is divided into passages.
    • GetScorer(string): Customize how passages are ranked.
    • GetFormatter(string): Customize how snippets are formatted.
    • GetIndexAnalyzer(string): Enable highlighting of MultiTermQuerys such as Lucene.Net.Search.WildcardQuery.

    WARNING: The code is very new and probably still has some exciting bugs!

    Example usage:
    // configure field with offsets at index time
    IndexableFieldType offsetsType = new IndexableFieldType(TextField.TYPE_STORED);
    offsetsType.IndexOptions = IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS;
    Field body = new Field("body", "foobar", offsetsType);
    

    // retrieve highlights at query time ICUPostingsHighlighter highlighter = new ICUPostingsHighlighter(); Query query = new TermQuery(new Term("body", "highlighting")); TopDocs topDocs = searcher.Search(query, n); string highlights[] = highlighter.Highlight("body", query, searcher, topDocs);

    This is thread-safe, and can be used across different readers.

    Note that the .NET implementation differs from the PostingsHighlighter in Lucene in that it is backed by an ICU ICU4N.Text.RuleBasedBreakIterator, which differs slightly in default behavior than the one in the JDK. However, the ICU ICU4N.Text.RuleBasedBreakIterator behavior can be customized to meet a lot of scenarios that the one in the JDK cannot. See the ICU documentation at http://userguide.icu-project.org/boundaryanalysis/break-rules for more information how to pass custom rules to an ICU ICU4N.Text.RuleBasedBreakIterator.

    Note

    This API is experimental and might change in incompatible ways in the next release.

    GetScorer(string)

    Returns the PassageScorer to use for ranking passages. This returns a new PassageScorer by default; subclasses can override to customize.

    Declaration
    protected virtual PassageScorer GetScorer(string field)
    Parameters
    Type Name Description
    string field
    Returns
    Type Description
    PassageScorer
    Remarks

    PostingsHighlighter treats the single original document as the whole corpus, and then scores individual passages as if they were documents in this corpus. It uses a ICU4N.Text.BreakIterator to find passages in the text; by default it breaks using GetSentenceInstance(CultureInfo) (for sentence breaking). It then iterates in parallel (merge sorting by offset) through the positions of all terms from the query, coalescing those hits that occur in a single passage into a Passage, and then scores each Passage using a separate PassageScorer. Passages are finally formatted into highlighted snippets with a PassageFormatter.

    You can customize the behavior by subclassing this highlighter, some important hooks:
    • GetBreakIterator(string): Customize how the text is divided into passages.
    • GetScorer(string): Customize how passages are ranked.
    • GetFormatter(string): Customize how snippets are formatted.
    • GetIndexAnalyzer(string): Enable highlighting of MultiTermQuerys such as Lucene.Net.Search.WildcardQuery.

    WARNING: The code is very new and probably still has some exciting bugs!

    Example usage:
    // configure field with offsets at index time
    IndexableFieldType offsetsType = new IndexableFieldType(TextField.TYPE_STORED);
    offsetsType.IndexOptions = IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS;
    Field body = new Field("body", "foobar", offsetsType);
    

    // retrieve highlights at query time ICUPostingsHighlighter highlighter = new ICUPostingsHighlighter(); Query query = new TermQuery(new Term("body", "highlighting")); TopDocs topDocs = searcher.Search(query, n); string highlights[] = highlighter.Highlight("body", query, searcher, topDocs);

    This is thread-safe, and can be used across different readers.

    Note that the .NET implementation differs from the PostingsHighlighter in Lucene in that it is backed by an ICU ICU4N.Text.RuleBasedBreakIterator, which differs slightly in default behavior than the one in the JDK. However, the ICU ICU4N.Text.RuleBasedBreakIterator behavior can be customized to meet a lot of scenarios that the one in the JDK cannot. See the ICU documentation at http://userguide.icu-project.org/boundaryanalysis/break-rules for more information how to pass custom rules to an ICU ICU4N.Text.RuleBasedBreakIterator.

    Note

    This API is experimental and might change in incompatible ways in the next release.

    Highlight(string, Query, IndexSearcher, TopDocs)

    Highlights the top passages from a single field.

    Declaration
    public virtual string[] Highlight(string field, Query query, IndexSearcher searcher, TopDocs topDocs)
    Parameters
    Type Name Description
    string field

    field name to highlight. Must have a stored string value and also be indexed with offsets.

    Query query

    query to highlight.

    IndexSearcher searcher

    searcher that was previously used to execute the query.

    TopDocs topDocs

    TopDocs containing the summary result documents to highlight.

    Returns
    Type Description
    string[]

    Array of formatted snippets corresponding to the documents in topDocs. If no highlights were found for a document, the first sentence for the field will be returned.

    Remarks

    PostingsHighlighter treats the single original document as the whole corpus, and then scores individual passages as if they were documents in this corpus. It uses a ICU4N.Text.BreakIterator to find passages in the text; by default it breaks using GetSentenceInstance(CultureInfo) (for sentence breaking). It then iterates in parallel (merge sorting by offset) through the positions of all terms from the query, coalescing those hits that occur in a single passage into a Passage, and then scores each Passage using a separate PassageScorer. Passages are finally formatted into highlighted snippets with a PassageFormatter.

    You can customize the behavior by subclassing this highlighter, some important hooks:
    • GetBreakIterator(string): Customize how the text is divided into passages.
    • GetScorer(string): Customize how passages are ranked.
    • GetFormatter(string): Customize how snippets are formatted.
    • GetIndexAnalyzer(string): Enable highlighting of MultiTermQuerys such as Lucene.Net.Search.WildcardQuery.

    WARNING: The code is very new and probably still has some exciting bugs!

    Example usage:
    // configure field with offsets at index time
    IndexableFieldType offsetsType = new IndexableFieldType(TextField.TYPE_STORED);
    offsetsType.IndexOptions = IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS;
    Field body = new Field("body", "foobar", offsetsType);
    

    // retrieve highlights at query time ICUPostingsHighlighter highlighter = new ICUPostingsHighlighter(); Query query = new TermQuery(new Term("body", "highlighting")); TopDocs topDocs = searcher.Search(query, n); string highlights[] = highlighter.Highlight("body", query, searcher, topDocs);

    This is thread-safe, and can be used across different readers.

    Note that the .NET implementation differs from the PostingsHighlighter in Lucene in that it is backed by an ICU ICU4N.Text.RuleBasedBreakIterator, which differs slightly in default behavior than the one in the JDK. However, the ICU ICU4N.Text.RuleBasedBreakIterator behavior can be customized to meet a lot of scenarios that the one in the JDK cannot. See the ICU documentation at http://userguide.icu-project.org/boundaryanalysis/break-rules for more information how to pass custom rules to an ICU ICU4N.Text.RuleBasedBreakIterator.

    Note

    This API is experimental and might change in incompatible ways in the next release.

    Exceptions
    Type Condition
    IOException

    if an I/O error occurred during processing

    ArgumentException

    if field was indexed without Lucene.Net.Index.IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS

    Highlight(string, Query, IndexSearcher, TopDocs, int)

    Highlights the top-N passages from a single field.

    Declaration
    public virtual string[] Highlight(string field, Query query, IndexSearcher searcher, TopDocs topDocs, int maxPassages)
    Parameters
    Type Name Description
    string field

    field name to highlight. Must have a stored string value and also be indexed with offsets.

    Query query

    query to highlight.

    IndexSearcher searcher

    searcher that was previously used to execute the query.

    TopDocs topDocs

    TopDocs containing the summary result documents to highlight.

    int maxPassages

    The maximum number of top-N ranked passages used to form the highlighted snippets.

    Returns
    Type Description
    string[]

    Array of formatted snippets corresponding to the documents in topDocs. If no highlights were found for a document, the first maxPassages sentences from the field will be returned.

    Remarks

    PostingsHighlighter treats the single original document as the whole corpus, and then scores individual passages as if they were documents in this corpus. It uses a ICU4N.Text.BreakIterator to find passages in the text; by default it breaks using GetSentenceInstance(CultureInfo) (for sentence breaking). It then iterates in parallel (merge sorting by offset) through the positions of all terms from the query, coalescing those hits that occur in a single passage into a Passage, and then scores each Passage using a separate PassageScorer. Passages are finally formatted into highlighted snippets with a PassageFormatter.

    You can customize the behavior by subclassing this highlighter, some important hooks:
    • GetBreakIterator(string): Customize how the text is divided into passages.
    • GetScorer(string): Customize how passages are ranked.
    • GetFormatter(string): Customize how snippets are formatted.
    • GetIndexAnalyzer(string): Enable highlighting of MultiTermQuerys such as Lucene.Net.Search.WildcardQuery.

    WARNING: The code is very new and probably still has some exciting bugs!

    Example usage:
    // configure field with offsets at index time
    IndexableFieldType offsetsType = new IndexableFieldType(TextField.TYPE_STORED);
    offsetsType.IndexOptions = IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS;
    Field body = new Field("body", "foobar", offsetsType);
    

    // retrieve highlights at query time ICUPostingsHighlighter highlighter = new ICUPostingsHighlighter(); Query query = new TermQuery(new Term("body", "highlighting")); TopDocs topDocs = searcher.Search(query, n); string highlights[] = highlighter.Highlight("body", query, searcher, topDocs);

    This is thread-safe, and can be used across different readers.

    Note that the .NET implementation differs from the PostingsHighlighter in Lucene in that it is backed by an ICU ICU4N.Text.RuleBasedBreakIterator, which differs slightly in default behavior than the one in the JDK. However, the ICU ICU4N.Text.RuleBasedBreakIterator behavior can be customized to meet a lot of scenarios that the one in the JDK cannot. See the ICU documentation at http://userguide.icu-project.org/boundaryanalysis/break-rules for more information how to pass custom rules to an ICU ICU4N.Text.RuleBasedBreakIterator.

    Note

    This API is experimental and might change in incompatible ways in the next release.

    Exceptions
    Type Condition
    IOException

    if an I/O error occurred during processing

    ArgumentException

    Illegal if field was indexed without Lucene.Net.Index.IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS

    HighlightFields(string[], Query, IndexSearcher, TopDocs)

    Highlights the top passages from multiple fields.

    Conceptually, this behaves as a more efficient form of:
    IDictionary<string, string[]> m = new Dictionary<string, string[]>();
    foreach (string field in fields)
    {
        m[field] = Highlight(field, query, searcher, topDocs);
    }
    return m;
    Declaration
    public virtual IDictionary<string, string[]> HighlightFields(string[] fields, Query query, IndexSearcher searcher, TopDocs topDocs)
    Parameters
    Type Name Description
    string[] fields

    field names to highlight. Must have a stored string value and also be indexed with offsets.

    Query query

    query to highlight.

    IndexSearcher searcher

    searcher that was previously used to execute the query.

    TopDocs topDocs

    TopDocs containing the summary result documents to highlight.

    Returns
    Type Description
    IDictionary<string, string[]>

    IDictionary{string,string[]} keyed on field name, containing the array of formatted snippets corresponding to the documents in topDocs. If no highlights were found for a document, the first sentence from the field will be returned.

    Remarks

    PostingsHighlighter treats the single original document as the whole corpus, and then scores individual passages as if they were documents in this corpus. It uses a ICU4N.Text.BreakIterator to find passages in the text; by default it breaks using GetSentenceInstance(CultureInfo) (for sentence breaking). It then iterates in parallel (merge sorting by offset) through the positions of all terms from the query, coalescing those hits that occur in a single passage into a Passage, and then scores each Passage using a separate PassageScorer. Passages are finally formatted into highlighted snippets with a PassageFormatter.

    You can customize the behavior by subclassing this highlighter, some important hooks:
    • GetBreakIterator(string): Customize how the text is divided into passages.
    • GetScorer(string): Customize how passages are ranked.
    • GetFormatter(string): Customize how snippets are formatted.
    • GetIndexAnalyzer(string): Enable highlighting of MultiTermQuerys such as Lucene.Net.Search.WildcardQuery.

    WARNING: The code is very new and probably still has some exciting bugs!

    Example usage:
    // configure field with offsets at index time
    IndexableFieldType offsetsType = new IndexableFieldType(TextField.TYPE_STORED);
    offsetsType.IndexOptions = IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS;
    Field body = new Field("body", "foobar", offsetsType);
    

    // retrieve highlights at query time ICUPostingsHighlighter highlighter = new ICUPostingsHighlighter(); Query query = new TermQuery(new Term("body", "highlighting")); TopDocs topDocs = searcher.Search(query, n); string highlights[] = highlighter.Highlight("body", query, searcher, topDocs);

    This is thread-safe, and can be used across different readers.

    Note that the .NET implementation differs from the PostingsHighlighter in Lucene in that it is backed by an ICU ICU4N.Text.RuleBasedBreakIterator, which differs slightly in default behavior than the one in the JDK. However, the ICU ICU4N.Text.RuleBasedBreakIterator behavior can be customized to meet a lot of scenarios that the one in the JDK cannot. See the ICU documentation at http://userguide.icu-project.org/boundaryanalysis/break-rules for more information how to pass custom rules to an ICU ICU4N.Text.RuleBasedBreakIterator.

    Note

    This API is experimental and might change in incompatible ways in the next release.

    Exceptions
    Type Condition
    IOException

    if an I/O error occurred during processing

    ArgumentException

    if field was indexed without Lucene.Net.Index.IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS

    HighlightFields(string[], Query, IndexSearcher, TopDocs, int[])

    Highlights the top-N passages from multiple fields.

    Conceptually, this behaves as a more efficient form of:
    IDictionary<string, string[]> m = new Dictionary<string, string[]>();
    foreach (string field in fields)
    {
        m[field] = Highlight(field, query, searcher, topDocs, maxPassages);
    }
    return m;
    Declaration
    public virtual IDictionary<string, string[]> HighlightFields(string[] fields, Query query, IndexSearcher searcher, TopDocs topDocs, int[] maxPassages)
    Parameters
    Type Name Description
    string[] fields

    field names to highlight. Must have a stored string value and also be indexed with offsets.

    Query query

    query to highlight.

    IndexSearcher searcher

    searcher that was previously used to execute the query.

    TopDocs topDocs

    TopDocs containing the summary result documents to highlight.

    int[] maxPassages

    The maximum number of top-N ranked passages per-field used to form the highlighted snippets.

    Returns
    Type Description
    IDictionary<string, string[]>

    IDictionary{string,string[]} keyed on field name, containing the array of formatted snippets corresponding to the documents in topDocs. If no highlights were found for a document, the first maxPassages sentences from the field will be returned.

    Remarks

    PostingsHighlighter treats the single original document as the whole corpus, and then scores individual passages as if they were documents in this corpus. It uses a ICU4N.Text.BreakIterator to find passages in the text; by default it breaks using GetSentenceInstance(CultureInfo) (for sentence breaking). It then iterates in parallel (merge sorting by offset) through the positions of all terms from the query, coalescing those hits that occur in a single passage into a Passage, and then scores each Passage using a separate PassageScorer. Passages are finally formatted into highlighted snippets with a PassageFormatter.

    You can customize the behavior by subclassing this highlighter, some important hooks:
    • GetBreakIterator(string): Customize how the text is divided into passages.
    • GetScorer(string): Customize how passages are ranked.
    • GetFormatter(string): Customize how snippets are formatted.
    • GetIndexAnalyzer(string): Enable highlighting of MultiTermQuerys such as Lucene.Net.Search.WildcardQuery.

    WARNING: The code is very new and probably still has some exciting bugs!

    Example usage:
    // configure field with offsets at index time
    IndexableFieldType offsetsType = new IndexableFieldType(TextField.TYPE_STORED);
    offsetsType.IndexOptions = IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS;
    Field body = new Field("body", "foobar", offsetsType);
    

    // retrieve highlights at query time ICUPostingsHighlighter highlighter = new ICUPostingsHighlighter(); Query query = new TermQuery(new Term("body", "highlighting")); TopDocs topDocs = searcher.Search(query, n); string highlights[] = highlighter.Highlight("body", query, searcher, topDocs);

    This is thread-safe, and can be used across different readers.

    Note that the .NET implementation differs from the PostingsHighlighter in Lucene in that it is backed by an ICU ICU4N.Text.RuleBasedBreakIterator, which differs slightly in default behavior than the one in the JDK. However, the ICU ICU4N.Text.RuleBasedBreakIterator behavior can be customized to meet a lot of scenarios that the one in the JDK cannot. See the ICU documentation at http://userguide.icu-project.org/boundaryanalysis/break-rules for more information how to pass custom rules to an ICU ICU4N.Text.RuleBasedBreakIterator.

    Note

    This API is experimental and might change in incompatible ways in the next release.

    Exceptions
    Type Condition
    IOException

    if an I/O error occurred during processing

    ArgumentException

    if field was indexed without Lucene.Net.Index.IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS

    HighlightFields(string[], Query, IndexSearcher, int[], int[])

    Highlights the top-N passages from multiple fields, for the provided int[] docids.

    Declaration
    public virtual IDictionary<string, string[]> HighlightFields(string[] fieldsIn, Query query, IndexSearcher searcher, int[] docidsIn, int[] maxPassagesIn)
    Parameters
    Type Name Description
    string[] fieldsIn

    field names to highlight. Must have a stored string value and also be indexed with offsets.

    Query query

    query to highlight.

    IndexSearcher searcher

    searcher that was previously used to execute the query.

    int[] docidsIn

    containing the document IDs to highlight.

    int[] maxPassagesIn

    The maximum number of top-N ranked passages per-field used to form the highlighted snippets.

    Returns
    Type Description
    IDictionary<string, string[]>

    IDictionary{string,string[]} keyed on field name, containing the array of formatted snippets corresponding to the documents in docidsIn. If no highlights were found for a document, the first maxPassages from the field will be returned.

    Remarks

    PostingsHighlighter treats the single original document as the whole corpus, and then scores individual passages as if they were documents in this corpus. It uses a ICU4N.Text.BreakIterator to find passages in the text; by default it breaks using GetSentenceInstance(CultureInfo) (for sentence breaking). It then iterates in parallel (merge sorting by offset) through the positions of all terms from the query, coalescing those hits that occur in a single passage into a Passage, and then scores each Passage using a separate PassageScorer. Passages are finally formatted into highlighted snippets with a PassageFormatter.

    You can customize the behavior by subclassing this highlighter, some important hooks:
    • GetBreakIterator(string): Customize how the text is divided into passages.
    • GetScorer(string): Customize how passages are ranked.
    • GetFormatter(string): Customize how snippets are formatted.
    • GetIndexAnalyzer(string): Enable highlighting of MultiTermQuerys such as Lucene.Net.Search.WildcardQuery.

    WARNING: The code is very new and probably still has some exciting bugs!

    Example usage:
    // configure field with offsets at index time
    IndexableFieldType offsetsType = new IndexableFieldType(TextField.TYPE_STORED);
    offsetsType.IndexOptions = IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS;
    Field body = new Field("body", "foobar", offsetsType);
    

    // retrieve highlights at query time ICUPostingsHighlighter highlighter = new ICUPostingsHighlighter(); Query query = new TermQuery(new Term("body", "highlighting")); TopDocs topDocs = searcher.Search(query, n); string highlights[] = highlighter.Highlight("body", query, searcher, topDocs);

    This is thread-safe, and can be used across different readers.

    Note that the .NET implementation differs from the PostingsHighlighter in Lucene in that it is backed by an ICU ICU4N.Text.RuleBasedBreakIterator, which differs slightly in default behavior than the one in the JDK. However, the ICU ICU4N.Text.RuleBasedBreakIterator behavior can be customized to meet a lot of scenarios that the one in the JDK cannot. See the ICU documentation at http://userguide.icu-project.org/boundaryanalysis/break-rules for more information how to pass custom rules to an ICU ICU4N.Text.RuleBasedBreakIterator.

    Note

    This API is experimental and might change in incompatible ways in the next release.

    Exceptions
    Type Condition
    IOException

    if an I/O error occurred during processing

    ArgumentException

    if field was indexed without Lucene.Net.Index.IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS

    HighlightFieldsAsObjects(string[], Query, IndexSearcher, int[], int[])

    Expert: highlights the top-N passages from multiple fields, for the provided int[] docids, to custom object as returned by the PassageFormatter. Use this API to render to something other than string.

    Declaration
    protected virtual IDictionary<string, object[]> HighlightFieldsAsObjects(string[] fieldsIn, Query query, IndexSearcher searcher, int[] docidsIn, int[] maxPassagesIn)
    Parameters
    Type Name Description
    string[] fieldsIn

    field names to highlight. Must have a stored string value and also be indexed with offsets.

    Query query

    query to highlight.

    IndexSearcher searcher

    searcher that was previously used to execute the query.

    int[] docidsIn

    containing the document IDs to highlight.

    int[] maxPassagesIn

    The maximum number of top-N ranked passages per-field used to form the highlighted snippets.

    Returns
    Type Description
    IDictionary<string, object[]>

    IDictionary{string,object[]} keyed on field name, containing the array of formatted snippets corresponding to the documents in docidsIn. If no highlights were found for a document, the first maxPassagesIn from the field will be returned.

    Remarks

    PostingsHighlighter treats the single original document as the whole corpus, and then scores individual passages as if they were documents in this corpus. It uses a ICU4N.Text.BreakIterator to find passages in the text; by default it breaks using GetSentenceInstance(CultureInfo) (for sentence breaking). It then iterates in parallel (merge sorting by offset) through the positions of all terms from the query, coalescing those hits that occur in a single passage into a Passage, and then scores each Passage using a separate PassageScorer. Passages are finally formatted into highlighted snippets with a PassageFormatter.

    You can customize the behavior by subclassing this highlighter, some important hooks:
    • GetBreakIterator(string): Customize how the text is divided into passages.
    • GetScorer(string): Customize how passages are ranked.
    • GetFormatter(string): Customize how snippets are formatted.
    • GetIndexAnalyzer(string): Enable highlighting of MultiTermQuerys such as Lucene.Net.Search.WildcardQuery.

    WARNING: The code is very new and probably still has some exciting bugs!

    Example usage:
    // configure field with offsets at index time
    IndexableFieldType offsetsType = new IndexableFieldType(TextField.TYPE_STORED);
    offsetsType.IndexOptions = IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS;
    Field body = new Field("body", "foobar", offsetsType);
    

    // retrieve highlights at query time ICUPostingsHighlighter highlighter = new ICUPostingsHighlighter(); Query query = new TermQuery(new Term("body", "highlighting")); TopDocs topDocs = searcher.Search(query, n); string highlights[] = highlighter.Highlight("body", query, searcher, topDocs);

    This is thread-safe, and can be used across different readers.

    Note that the .NET implementation differs from the PostingsHighlighter in Lucene in that it is backed by an ICU ICU4N.Text.RuleBasedBreakIterator, which differs slightly in default behavior than the one in the JDK. However, the ICU ICU4N.Text.RuleBasedBreakIterator behavior can be customized to meet a lot of scenarios that the one in the JDK cannot. See the ICU documentation at http://userguide.icu-project.org/boundaryanalysis/break-rules for more information how to pass custom rules to an ICU ICU4N.Text.RuleBasedBreakIterator.

    Note

    This API is experimental and might change in incompatible ways in the next release.

    Exceptions
    Type Condition
    IOException

    if an I/O error occurred during processing

    ArgumentException

    if field was indexed without Lucene.Net.Index.IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS

    LoadFieldValues(IndexSearcher, string[], int[], int)

    Loads the string values for each field X docID to be highlighted. By default this loads from stored fields, but a subclass can change the source. This method should allocate the string[fields.length][docids.length] and fill all values. The returned strings must be identical to what was indexed.

    Declaration
    protected virtual IList<string[]> LoadFieldValues(IndexSearcher searcher, string[] fields, int[] docids, int maxLength)
    Parameters
    Type Name Description
    IndexSearcher searcher
    string[] fields
    int[] docids
    int maxLength
    Returns
    Type Description
    IList<string[]>
    Remarks

    PostingsHighlighter treats the single original document as the whole corpus, and then scores individual passages as if they were documents in this corpus. It uses a ICU4N.Text.BreakIterator to find passages in the text; by default it breaks using GetSentenceInstance(CultureInfo) (for sentence breaking). It then iterates in parallel (merge sorting by offset) through the positions of all terms from the query, coalescing those hits that occur in a single passage into a Passage, and then scores each Passage using a separate PassageScorer. Passages are finally formatted into highlighted snippets with a PassageFormatter.

    You can customize the behavior by subclassing this highlighter, some important hooks:
    • GetBreakIterator(string): Customize how the text is divided into passages.
    • GetScorer(string): Customize how passages are ranked.
    • GetFormatter(string): Customize how snippets are formatted.
    • GetIndexAnalyzer(string): Enable highlighting of MultiTermQuerys such as Lucene.Net.Search.WildcardQuery.

    WARNING: The code is very new and probably still has some exciting bugs!

    Example usage:
    // configure field with offsets at index time
    IndexableFieldType offsetsType = new IndexableFieldType(TextField.TYPE_STORED);
    offsetsType.IndexOptions = IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS;
    Field body = new Field("body", "foobar", offsetsType);
    

    // retrieve highlights at query time ICUPostingsHighlighter highlighter = new ICUPostingsHighlighter(); Query query = new TermQuery(new Term("body", "highlighting")); TopDocs topDocs = searcher.Search(query, n); string highlights[] = highlighter.Highlight("body", query, searcher, topDocs);

    This is thread-safe, and can be used across different readers.

    Note that the .NET implementation differs from the PostingsHighlighter in Lucene in that it is backed by an ICU ICU4N.Text.RuleBasedBreakIterator, which differs slightly in default behavior than the one in the JDK. However, the ICU ICU4N.Text.RuleBasedBreakIterator behavior can be customized to meet a lot of scenarios that the one in the JDK cannot. See the ICU documentation at http://userguide.icu-project.org/boundaryanalysis/break-rules for more information how to pass custom rules to an ICU ICU4N.Text.RuleBasedBreakIterator.

    Note

    This API is experimental and might change in incompatible ways in the next release.

    Back to top Copyright © 2024 The Apache Software Foundation, Licensed under the Apache License, Version 2.0
    Apache Lucene.Net, Lucene.Net, Apache, the Apache feather logo, and the Apache Lucene.Net project logo are trademarks of The Apache Software Foundation.
    All other marks mentioned may be trademarks or registered trademarks of their respective owners.