Fork me on GitHub
  • API

    Show / Hide Table of Contents

    Class IndicTokenizer

    Simple Tokenizer for text in Indian Languages.

    Inheritance
    object
    AttributeSource
    TokenStream
    Tokenizer
    CharTokenizer
    IndicTokenizer
    Implements
    IDisposable
    Inherited Members
    CharTokenizer.IncrementToken()
    CharTokenizer.End()
    CharTokenizer.Reset()
    Tokenizer.SetReader(TextReader)
    TokenStream.Dispose()
    AttributeSource.GetAttributeFactory()
    AttributeSource.GetAttributeClassesEnumerator()
    AttributeSource.GetAttributeImplsEnumerator()
    AttributeSource.AddAttributeImpl(Attribute)
    AttributeSource.AddAttribute<T>()
    AttributeSource.HasAttributes
    AttributeSource.HasAttribute<T>()
    AttributeSource.GetAttribute<T>()
    AttributeSource.ClearAttributes()
    AttributeSource.CaptureState()
    AttributeSource.RestoreState(AttributeSource.State)
    AttributeSource.GetHashCode()
    AttributeSource.Equals(object)
    AttributeSource.ReflectAsString(bool)
    AttributeSource.ReflectWith(IAttributeReflector)
    AttributeSource.CloneAttributes()
    AttributeSource.CopyTo(AttributeSource)
    AttributeSource.ToString()
    object.Equals(object, object)
    object.GetType()
    object.ReferenceEquals(object, object)
    Namespace: Lucene.Net.Analysis.In
    Assembly: Lucene.Net.Analysis.Common.dll
    Syntax
    [Obsolete("(3.6) Use StandardTokenizer instead.")]
    public sealed class IndicTokenizer : CharTokenizer, IDisposable

    Constructors

    IndicTokenizer(LuceneVersion, AttributeFactory, TextReader)

    Simple Tokenizer for text in Indian Languages.

    Declaration
    public IndicTokenizer(LuceneVersion matchVersion, AttributeSource.AttributeFactory factory, TextReader input)
    Parameters
    Type Name Description
    LuceneVersion matchVersion
    AttributeSource.AttributeFactory factory
    TextReader input

    IndicTokenizer(LuceneVersion, TextReader)

    Simple Tokenizer for text in Indian Languages.

    Declaration
    public IndicTokenizer(LuceneVersion matchVersion, TextReader input)
    Parameters
    Type Name Description
    LuceneVersion matchVersion
    TextReader input

    Methods

    IsTokenChar(int)

    Returns true iff a codepoint should be included in a token. This tokenizer generates as tokens adjacent sequences of codepoints which satisfy this predicate. Codepoints for which this is false are used to define token boundaries and are not included in tokens.

    Declaration
    protected override bool IsTokenChar(int c)
    Parameters
    Type Name Description
    int c
    Returns
    Type Description
    bool
    Overrides
    CharTokenizer.IsTokenChar(int)

    Implements

    IDisposable
    Back to top Copyright © 2024 The Apache Software Foundation, Licensed under the Apache License, Version 2.0
    Apache Lucene.Net, Lucene.Net, Apache, the Apache feather logo, and the Apache Lucene.Net project logo are trademarks of The Apache Software Foundation.
    All other marks mentioned may be trademarks or registered trademarks of their respective owners.