Fork me on GitHub
  • API

    Show / Hide Table of Contents

    Class ICUFoldingFilter

    A Lucene.Net.Analysis.TokenFilter that applies search term folding to Unicode text, applying foldings from UTR#30 Character Foldings.

    Inheritance
    object
    AttributeSource
    TokenStream
    TokenFilter
    ICUNormalizer2Filter
    ICUFoldingFilter
    Implements
    IDisposable
    Inherited Members
    ICUNormalizer2Filter.IncrementToken()
    TokenFilter.End()
    TokenFilter.Reset()
    TokenStream.Dispose()
    AttributeSource.GetAttributeFactory()
    AttributeSource.GetAttributeClassesEnumerator()
    AttributeSource.GetAttributeImplsEnumerator()
    AttributeSource.AddAttributeImpl(Attribute)
    AttributeSource.AddAttribute<T>()
    AttributeSource.HasAttributes
    AttributeSource.HasAttribute<T>()
    AttributeSource.GetAttribute<T>()
    AttributeSource.ClearAttributes()
    AttributeSource.CaptureState()
    AttributeSource.RestoreState(AttributeSource.State)
    AttributeSource.GetHashCode()
    AttributeSource.Equals(object)
    AttributeSource.ReflectAsString(bool)
    AttributeSource.ReflectWith(IAttributeReflector)
    AttributeSource.CloneAttributes()
    AttributeSource.CopyTo(AttributeSource)
    AttributeSource.ToString()
    object.Equals(object, object)
    object.GetType()
    object.ReferenceEquals(object, object)
    Namespace: Lucene.Net.Analysis.Icu
    Assembly: Lucene.Net.ICU.dll
    Syntax
    public sealed class ICUFoldingFilter : ICUNormalizer2Filter, IDisposable
    Remarks

    This filter applies the following foldings from the report to unicode text:

    • Accent removal
    • Case folding
    • Canonical duplicates folding
    • Dashes folding
    • Diacritic removal (including stroke, hook, descender)
    • Greek letterforms folding
    • Han Radical folding
    • Hebrew Alternates folding
    • Jamo folding
    • Letterforms folding
    • Math symbol folding
    • Multigraph Expansions: All
    • Native digit folding
    • No-break folding
    • Overline folding
    • Positional forms folding
    • Small forms folding
    • Space folding
    • Spacing Accents folding
    • Subscript folding
    • Superscript folding
    • Suzhou Numeral folding
    • Symbol folding
    • Underline folding
    • Vertical forms folding
    • Width folding

    Additionally, Default Ignorables are removed, and text is normalized to NFKC. All foldings, case folding, and normalization mappings are applied recursively to ensure a fully folded and normalized result.

    Constructors

    ICUFoldingFilter(TokenStream)

    Create a new ICUFoldingFilter on the specified input

    Declaration
    public ICUFoldingFilter(TokenStream input)
    Parameters
    Type Name Description
    TokenStream input
    Remarks

    This filter applies the following foldings from the report to unicode text:

    • Accent removal
    • Case folding
    • Canonical duplicates folding
    • Dashes folding
    • Diacritic removal (including stroke, hook, descender)
    • Greek letterforms folding
    • Han Radical folding
    • Hebrew Alternates folding
    • Jamo folding
    • Letterforms folding
    • Math symbol folding
    • Multigraph Expansions: All
    • Native digit folding
    • No-break folding
    • Overline folding
    • Positional forms folding
    • Small forms folding
    • Space folding
    • Spacing Accents folding
    • Subscript folding
    • Superscript folding
    • Suzhou Numeral folding
    • Symbol folding
    • Underline folding
    • Vertical forms folding
    • Width folding

    Additionally, Default Ignorables are removed, and text is normalized to NFKC. All foldings, case folding, and normalization mappings are applied recursively to ensure a fully folded and normalized result.

    Implements

    IDisposable
    Back to top Copyright © 2024 The Apache Software Foundation, Licensed under the Apache License, Version 2.0
    Apache Lucene.Net, Lucene.Net, Apache, the Apache feather logo, and the Apache Lucene.Net project logo are trademarks of The Apache Software Foundation.
    All other marks mentioned may be trademarks or registered trademarks of their respective owners.