Show / Hide Table of Contents

    Class ICUNormalizer2Filter

    Normalize token text with ICU's .

    Inheritance
    System.Object
    AttributeSource
    TokenStream
    TokenFilter
    ICUNormalizer2Filter
    ICUFoldingFilter
    Implements
    IDisposable
    Inherited Members
    TokenFilter.m_input
    TokenFilter.End()
    TokenFilter.Dispose(Boolean)
    TokenFilter.Reset()
    TokenStream.Dispose()
    AttributeSource.GetAttributeFactory()
    AttributeSource.GetAttributeClassesEnumerator()
    AttributeSource.GetAttributeImplsEnumerator()
    AttributeSource.AddAttributeImpl(Attribute)
    AttributeSource.AddAttribute<T>()
    AttributeSource.HasAttributes
    AttributeSource.HasAttribute<T>()
    AttributeSource.GetAttribute<T>()
    AttributeSource.ClearAttributes()
    AttributeSource.CaptureState()
    AttributeSource.RestoreState(AttributeSource.State)
    AttributeSource.GetHashCode()
    AttributeSource.Equals(Object)
    AttributeSource.ReflectAsString(Boolean)
    AttributeSource.ReflectWith(IAttributeReflector)
    AttributeSource.CloneAttributes()
    AttributeSource.CopyTo(AttributeSource)
    AttributeSource.ToString()
    Namespace: Lucene.Net.Analysis.Icu
    Assembly: Lucene.Net.ICU.dll
    Syntax
    public class ICUNormalizer2Filter : TokenFilter, IDisposable
    Remarks

    With this filter, you can normalize text in the following ways:

    • NFKC Normalization, Case Folding, and removing Ignorables (the default)
    • Using a standard Normalization mode (NFC, NFD, NFKC, NFKD)
    • Based on rules from a custom normalization mapping.

    If you use the defaults, this filter is a simple way to standardize Unicode text in a language-independent way for search:

    • The case folding that it does can be seen as a replacement for LowerCaseFilter: For example, it handles cases such as the Greek sigma, so that "Μάϊος" and "ΜΆΪΟΣ" will match correctly.
    • The normalization will standardizes different forms of the same character in Unicode. For example, CJK full-width numbers will be standardized to their ASCII forms.
    • Ignorables such as Zero-Width Joiner and Variation Selectors are removed. These are typically modifier characters that affect display.

    Constructors

    | Improve this Doc View Source

    ICUNormalizer2Filter(TokenStream)

    Create a new ICUNormalizer2Filter that combines NFKC normalization, Case Folding, and removes Default Ignorables (NFKC_Casefold)

    Declaration
    public ICUNormalizer2Filter(TokenStream input)
    Parameters
    Type Name Description
    TokenStream input
    | Improve this Doc View Source

    ICUNormalizer2Filter(TokenStream, Normalizer2)

    Create a new ICUNormalizer2Filter with the specified

    Declaration
    public ICUNormalizer2Filter(TokenStream input, Normalizer2 normalizer)
    Parameters
    Type Name Description
    TokenStream input

    stream

    Normalizer2 normalizer

    normalizer to use

    Methods

    | Improve this Doc View Source

    IncrementToken()

    Declaration
    public override sealed bool IncrementToken()
    Returns
    Type Description
    System.Boolean
    Overrides
    TokenStream.IncrementToken()

    Implements

    IDisposable
    • Improve this Doc
    • View Source
    Back to top Copyright © 2020 Licensed to the Apache Software Foundation (ASF)