Fork me on GitHub
  • API

    Show / Hide Table of Contents

    Class ICUTokenizer

    Breaks text into words according to UAX #29: Unicode Text Segmentation (http://www.unicode.org/reports/tr29/)

    Words are broken across script boundaries, then segmented according to the BreakIterator and typing provided by the ICUTokenizerConfig

    Note

    This API is experimental and might change in incompatible ways in the next release.

    Inheritance
    System.Object
    Lucene.Net.Util.AttributeSource
    Lucene.Net.Analysis.TokenStream
    Lucene.Net.Analysis.Tokenizer
    ICUTokenizer
    Implements
    System.IDisposable
    Inherited Members
    Lucene.Net.Analysis.Tokenizer.m_input
    Tokenizer.Dispose(Boolean)
    Tokenizer.CorrectOffset(Int32)
    Tokenizer.SetReader(TextReader)
    Lucene.Net.Analysis.TokenStream.Dispose()
    Lucene.Net.Util.AttributeSource.GetAttributeFactory()
    Lucene.Net.Util.AttributeSource.GetAttributeClassesEnumerator()
    Lucene.Net.Util.AttributeSource.GetAttributeImplsEnumerator()
    Lucene.Net.Util.AttributeSource.AddAttributeImpl(Lucene.Net.Util.Attribute)
    Lucene.Net.Util.AttributeSource.AddAttribute<T>()
    Lucene.Net.Util.AttributeSource.HasAttributes
    Lucene.Net.Util.AttributeSource.HasAttribute<T>()
    Lucene.Net.Util.AttributeSource.GetAttribute<T>()
    Lucene.Net.Util.AttributeSource.ClearAttributes()
    Lucene.Net.Util.AttributeSource.CaptureState()
    Lucene.Net.Util.AttributeSource.RestoreState(Lucene.Net.Util.AttributeSource.State)
    Lucene.Net.Util.AttributeSource.GetHashCode()
    AttributeSource.Equals(Object)
    AttributeSource.ReflectAsString(Boolean)
    Lucene.Net.Util.AttributeSource.ReflectWith(Lucene.Net.Util.IAttributeReflector)
    Lucene.Net.Util.AttributeSource.CloneAttributes()
    Lucene.Net.Util.AttributeSource.CopyTo(Lucene.Net.Util.AttributeSource)
    Lucene.Net.Util.AttributeSource.ToString()
    System.Object.Equals(System.Object, System.Object)
    System.Object.GetType()
    System.Object.MemberwiseClone()
    System.Object.ReferenceEquals(System.Object, System.Object)
    Namespace: Lucene.Net.Analysis.Icu.Segmentation
    Assembly: Lucene.Net.ICU.dll
    Syntax
    public sealed class ICUTokenizer : Tokenizer, IDisposable

    Constructors

    | Improve this Doc View Source

    ICUTokenizer(AttributeSource.AttributeFactory, TextReader, ICUTokenizerConfig)

    Construct a new ICUTokenizer that breaks text into words from the given System.IO.TextReader, using a tailored ICU4N.Text.BreakIterator configuration.

    Declaration
    public ICUTokenizer(AttributeSource.AttributeFactory factory, TextReader input, ICUTokenizerConfig config)
    Parameters
    Type Name Description
    Lucene.Net.Util.AttributeSource.AttributeFactory factory

    Lucene.Net.Util.AttributeSource.AttributeFactory to use.

    System.IO.TextReader input

    System.IO.TextReader containing text to tokenize.

    ICUTokenizerConfig config

    Tailored ICU4N.Text.BreakIterator configuration.

    | Improve this Doc View Source

    ICUTokenizer(TextReader)

    Construct a new ICUTokenizer that breaks text into words from the given System.IO.TextReader.

    Declaration
    public ICUTokenizer(TextReader input)
    Parameters
    Type Name Description
    System.IO.TextReader input

    System.IO.TextReader containing text to tokenize.

    Remarks

    The default script-specific handling is used.

    The default attribute factory is used.

    See Also
    DefaultICUTokenizerConfig
    | Improve this Doc View Source

    ICUTokenizer(TextReader, ICUTokenizerConfig)

    Construct a new ICUTokenizer that breaks text into words from the given System.IO.TextReader, using a tailored ICU4N.Text.BreakIterator configuration.

    Declaration
    public ICUTokenizer(TextReader input, ICUTokenizerConfig config)
    Parameters
    Type Name Description
    System.IO.TextReader input

    System.IO.TextReader containing text to tokenize.

    ICUTokenizerConfig config

    Tailored ICU4N.Text.BreakIterator configuration.

    Remarks

    The default attribute factory is used.

    Methods

    | Improve this Doc View Source

    End()

    Declaration
    public override void End()
    Overrides
    Lucene.Net.Analysis.TokenStream.End()
    | Improve this Doc View Source

    IncrementToken()

    Declaration
    public override bool IncrementToken()
    Returns
    Type Description
    System.Boolean
    Overrides
    Lucene.Net.Analysis.TokenStream.IncrementToken()
    | Improve this Doc View Source

    Reset()

    Declaration
    public override void Reset()
    Overrides
    Lucene.Net.Analysis.Tokenizer.Reset()

    Implements

    System.IDisposable

    See Also

    ICUTokenizerConfig
    • Improve this Doc
    • View Source
    Back to top Copyright © 2021 The Apache Software Foundation, Licensed under the Apache License, Version 2.0
    Apache Lucene.Net, Lucene.Net, Apache, the Apache feather logo, and the Apache Lucene.Net project logo are trademarks of The Apache Software Foundation.
    All other marks mentioned may be trademarks or registered trademarks of their respective owners.