Show / Hide Table of Contents

    Class ICUNormalizer2Filter

    Normalize token text with ICU's ICU4N.Text.Normalizer2.

    Inheritance
    System.Object
    AttributeSource
    TokenStream
    TokenFilter
    ICUNormalizer2Filter
    ICUFoldingFilter
    Implements
    System.IDisposable
    Inherited Members
    TokenFilter.m_input
    TokenFilter.End()
    TokenFilter.Dispose(Boolean)
    TokenFilter.Reset()
    TokenStream.Dispose()
    AttributeSource.GetAttributeFactory()
    AttributeSource.GetAttributeClassesEnumerator()
    AttributeSource.GetAttributeImplsEnumerator()
    AttributeSource.AddAttributeImpl(Attribute)
    AttributeSource.AddAttribute<T>()
    AttributeSource.HasAttributes
    AttributeSource.HasAttribute<T>()
    AttributeSource.GetAttribute<T>()
    AttributeSource.ClearAttributes()
    AttributeSource.CaptureState()
    AttributeSource.RestoreState(AttributeSource.State)
    AttributeSource.GetHashCode()
    AttributeSource.Equals(Object)
    AttributeSource.ReflectAsString(Boolean)
    AttributeSource.ReflectWith(IAttributeReflector)
    AttributeSource.CloneAttributes()
    AttributeSource.CopyTo(AttributeSource)
    AttributeSource.ToString()
    System.Object.Equals(System.Object, System.Object)
    System.Object.GetType()
    System.Object.MemberwiseClone()
    System.Object.ReferenceEquals(System.Object, System.Object)
    Namespace: Lucene.Net.Analysis.Icu
    Assembly: Lucene.Net.ICU.dll
    Syntax
    public class ICUNormalizer2Filter : TokenFilter, IDisposable
    Remarks

    With this filter, you can normalize text in the following ways:

    • NFKC Normalization, Case Folding, and removing Ignorables (the default)
    • Using a standard Normalization mode (NFC, NFD, NFKC, NFKD)
    • Based on rules from a custom normalization mapping.

    If you use the defaults, this filter is a simple way to standardize Unicode text in a language-independent way for search:

    • The case folding that it does can be seen as a replacement for LowerCaseFilter: For example, it handles cases such as the Greek sigma, so that "Μάϊος" and "ΜΆΪΟΣ" will match correctly.
    • The normalization will standardizes different forms of the same character in Unicode. For example, CJK full-width numbers will be standardized to their ASCII forms.
    • Ignorables such as Zero-Width Joiner and Variation Selectors are removed. These are typically modifier characters that affect display.

    Constructors

    | Improve this Doc View Source

    ICUNormalizer2Filter(TokenStream)

    Create a new ICUNormalizer2Filter that combines NFKC normalization, Case Folding, and removes Default Ignorables (NFKC_Casefold)

    Declaration
    public ICUNormalizer2Filter(TokenStream input)
    Parameters
    Type Name Description
    TokenStream input
    | Improve this Doc View Source

    ICUNormalizer2Filter(TokenStream, Normalizer2)

    Create a new ICUNormalizer2Filter with the specified ICU4N.Text.Normalizer2

    Declaration
    public ICUNormalizer2Filter(TokenStream input, Normalizer2 normalizer)
    Parameters
    Type Name Description
    TokenStream input

    stream

    ICU4N.Text.Normalizer2 normalizer

    normalizer to use

    Methods

    | Improve this Doc View Source

    IncrementToken()

    Declaration
    public override sealed bool IncrementToken()
    Returns
    Type Description
    System.Boolean
    Overrides
    TokenStream.IncrementToken()

    Implements

    System.IDisposable

    See Also

    ICU4N.Text.Normalizer2
    ICU4N.Text.FilteredNormalizer2
    • Improve this Doc
    • View Source
    Back to top Copyright © 2019 Licensed to the Apache Software Foundation (ASF)