Show / Hide Table of Contents

    Class ICUTokenizer

    Breaks text into words according to UAX #29: Unicode Text Segmentation (http://www.unicode.org/reports/tr29/)

    Words are broken across script boundaries, then segmented according to the BreakIterator and typing provided by the ICUTokenizerConfig

    This is a Lucene.NET EXPERIMENTAL API, use at your own risk
    Inheritance
    System.Object
    AttributeSource
    TokenStream
    Tokenizer
    ICUTokenizer
    Implements
    System.IDisposable
    Inherited Members
    Tokenizer.m_input
    Tokenizer.Dispose(Boolean)
    Tokenizer.CorrectOffset(Int32)
    Tokenizer.SetReader(TextReader)
    TokenStream.Dispose()
    AttributeSource.GetAttributeFactory()
    AttributeSource.GetAttributeClassesEnumerator()
    AttributeSource.GetAttributeImplsEnumerator()
    AttributeSource.AddAttributeImpl(Attribute)
    AttributeSource.AddAttribute<T>()
    AttributeSource.HasAttributes
    AttributeSource.HasAttribute<T>()
    AttributeSource.GetAttribute<T>()
    AttributeSource.ClearAttributes()
    AttributeSource.CaptureState()
    AttributeSource.RestoreState(AttributeSource.State)
    AttributeSource.GetHashCode()
    AttributeSource.Equals(Object)
    AttributeSource.ReflectAsString(Boolean)
    AttributeSource.ReflectWith(IAttributeReflector)
    AttributeSource.CloneAttributes()
    AttributeSource.CopyTo(AttributeSource)
    AttributeSource.ToString()
    System.Object.Equals(System.Object, System.Object)
    System.Object.GetType()
    System.Object.MemberwiseClone()
    System.Object.ReferenceEquals(System.Object, System.Object)
    Namespace: Lucene.Net.Analysis.Icu.Segmentation
    Assembly: Lucene.Net.ICU.dll
    Syntax
    public sealed class ICUTokenizer : Tokenizer, IDisposable

    Constructors

    | Improve this Doc View Source

    ICUTokenizer(AttributeSource.AttributeFactory, TextReader, ICUTokenizerConfig)

    Construct a new ICUTokenizer that breaks text into words from the given System.IO.TextReader, using a tailored ICU4N.Text.BreakIterator configuration.

    Declaration
    public ICUTokenizer(AttributeSource.AttributeFactory factory, TextReader input, ICUTokenizerConfig config)
    Parameters
    Type Name Description
    AttributeSource.AttributeFactory factory

    AttributeSource.AttributeFactory to use.

    System.IO.TextReader input

    System.IO.TextReader containing text to tokenize.

    ICUTokenizerConfig config

    Tailored ICU4N.Text.BreakIterator configuration.

    | Improve this Doc View Source

    ICUTokenizer(TextReader)

    Construct a new ICUTokenizer that breaks text into words from the given System.IO.TextReader.

    Declaration
    public ICUTokenizer(TextReader input)
    Parameters
    Type Name Description
    System.IO.TextReader input

    System.IO.TextReader containing text to tokenize.

    Remarks

    The default script-specific handling is used.

    The default attribute factory is used.

    See Also
    DefaultICUTokenizerConfig
    | Improve this Doc View Source

    ICUTokenizer(TextReader, ICUTokenizerConfig)

    Construct a new ICUTokenizer that breaks text into words from the given System.IO.TextReader, using a tailored ICU4N.Text.BreakIterator configuration.

    Declaration
    public ICUTokenizer(TextReader input, ICUTokenizerConfig config)
    Parameters
    Type Name Description
    System.IO.TextReader input

    System.IO.TextReader containing text to tokenize.

    ICUTokenizerConfig config

    Tailored ICU4N.Text.BreakIterator configuration.

    Remarks

    The default attribute factory is used.

    Methods

    | Improve this Doc View Source

    End()

    Declaration
    public override void End()
    Overrides
    TokenStream.End()
    | Improve this Doc View Source

    IncrementToken()

    Declaration
    public override bool IncrementToken()
    Returns
    Type Description
    System.Boolean
    Overrides
    TokenStream.IncrementToken()
    | Improve this Doc View Source

    Reset()

    Declaration
    public override void Reset()
    Overrides
    Tokenizer.Reset()

    Implements

    System.IDisposable

    See Also

    ICUTokenizerConfig
    • Improve this Doc
    • View Source
    Back to top Copyright © 2020 Licensed to the Apache Software Foundation (ASF)