Fork me on GitHub
  • API

    Show / Hide Table of Contents

    Lucene.Net.ICU

    This module exposes functionality from ICU to Apache Lucene. ICU4N is a .NET library that enhances .NET's internationalization support by improving performance, keeping current with the Unicode Standard, and providing richer APIs.

    Note

    Since the .NET platform doesn't provide a BreakIterator class (or similar), the functionality that utilizes it was consolidated from Java Lucene's analyzers-icu package, Lucene.Net.Analysis.Common and Lucene.Net.Highlighter into this unified package.

    Warning

    While ICU4N's BreakIterator has customizable rules, its default behavior is not the same as the one in the JDK. When using any features of this package outside of the Lucene.Net.Analysis.Icu namespace, they will behave differently than they do in Java Lucene and the rules may need some tweaking to fit your needs. See the Break Rules ICU documentation for details on how to customize ICU4N.Text.RuleBaseBreakIterator.

    This module exposes the following functionality:

    • Text Analysis: For an introduction to Lucene's analysis API, see the <xref:Lucene.Net.Analysis> package documentation.

      • Text Segmentation: Tokenizes text based on properties and rules defined in Unicode.

      • Collation: Compare strings according to the conventions and standards of a particular language, region or country.

      • Normalization: Converts text to a unique, equivalent form.

      • Case Folding: Removes case distinctions with Unicode's Default Caseless Matching algorithm.

      • Search Term Folding: Removes distinctions (such as accent marks) between similar characters for a loose or fuzzy search.

      • Text Transformation: Transforms Unicode text in a context-sensitive fashion: e.g. mapping Traditional to Simplified Chinese

      • Thai Language Analysis

    • Unicode Highlighter Support

      • Postings Highlighter: Highlighter implementation that uses offsets from postings lists.

      • Vector Highlighter: An implementation of IBoundaryScanner for use with the vector highlighter in the Lucene.Net.Highlighter module.

    • Improve this Doc
    Back to top Copyright © 2022 The Apache Software Foundation, Licensed under the Apache License, Version 2.0
    Apache Lucene.Net, Lucene.Net, Apache, the Apache feather logo, and the Apache Lucene.Net project logo are trademarks of The Apache Software Foundation.
    All other marks mentioned may be trademarks or registered trademarks of their respective owners.