Fork me on GitHub
  • API

    Show / Hide Table of Contents

    Class Tokenizer

    A Tokenizer is a TokenStream whose input is a TextReader.

    This is an abstract class; subclasses must override IncrementToken()

    NOTE: Subclasses overriding IncrementToken() must call ClearAttributes() before setting attributes.
    Inheritance
    object
    AttributeSource
    TokenStream
    Tokenizer
    Implements
    IDisposable
    Inherited Members
    TokenStream.IncrementToken()
    TokenStream.End()
    TokenStream.Dispose()
    AttributeSource.GetAttributeFactory()
    AttributeSource.GetAttributeClassesEnumerator()
    AttributeSource.GetAttributeImplsEnumerator()
    AttributeSource.AddAttributeImpl(Attribute)
    AttributeSource.AddAttribute<T>()
    AttributeSource.HasAttributes
    AttributeSource.HasAttribute<T>()
    AttributeSource.GetAttribute<T>()
    AttributeSource.ClearAttributes()
    AttributeSource.CaptureState()
    AttributeSource.RestoreState(AttributeSource.State)
    AttributeSource.GetHashCode()
    AttributeSource.Equals(object)
    AttributeSource.ReflectAsString(bool)
    AttributeSource.ReflectWith(IAttributeReflector)
    AttributeSource.CloneAttributes()
    AttributeSource.CopyTo(AttributeSource)
    AttributeSource.ToString()
    object.Equals(object, object)
    object.GetType()
    object.MemberwiseClone()
    object.ReferenceEquals(object, object)
    Namespace: Lucene.Net.Analysis
    Assembly: Lucene.Net.dll
    Syntax
    public abstract class Tokenizer : TokenStream, IDisposable

    Constructors

    Tokenizer(AttributeFactory, TextReader)

    Construct a token stream processing the given input using the given AttributeSource.AttributeFactory.

    Declaration
    protected Tokenizer(AttributeSource.AttributeFactory factory, TextReader input)
    Parameters
    Type Name Description
    AttributeSource.AttributeFactory factory
    TextReader input

    Tokenizer(TextReader)

    Construct a token stream processing the given input.

    Declaration
    protected Tokenizer(TextReader input)
    Parameters
    Type Name Description
    TextReader input

    Fields

    m_input

    The text source for this Tokenizer.

    Declaration
    protected TextReader m_input
    Field Value
    Type Description
    TextReader

    Methods

    CorrectOffset(int)

    Return the corrected offset. If m_input is a CharFilter subclass this method calls CorrectOffset(int), else returns currentOff.

    Declaration
    protected int CorrectOffset(int currentOff)
    Parameters
    Type Name Description
    int currentOff

    offset as seen in the output

    Returns
    Type Description
    int

    corrected offset based on the input

    See Also
    CorrectOffset(int)

    Dispose(bool)

    Releases resources associated with this stream.

    If you override this method, always call base.Dispose(disposing), otherwise some internal state will not be correctly reset (e.g., Tokenizer will throw InvalidOperationException on reuse).
    Declaration
    protected override void Dispose(bool disposing)
    Parameters
    Type Name Description
    bool disposing
    Overrides
    TokenStream.Dispose(bool)
    Remarks

    NOTE: The default implementation closes the input TextReader, so be sure to call base.Dispose(disposing) when overriding this method.

    Reset()

    This method is called by a consumer before it begins consumption using IncrementToken().

    Resets this stream to a clean state. Stateful implementations must implement this method so that they can be reused, just as if they had been created fresh.

    If you override this method, always call base.Reset(), otherwise some internal state will not be correctly reset (e.g., Tokenizer will throw InvalidOperationException on further usage).
    Declaration
    public override void Reset()
    Overrides
    TokenStream.Reset()

    SetReader(TextReader)

    Expert: Set a new reader on the Tokenizer. Typically, an analyzer (in its tokenStream method) will use this to re-use a previously created tokenizer.

    Declaration
    public void SetReader(TextReader input)
    Parameters
    Type Name Description
    TextReader input

    Implements

    IDisposable
    Back to top Copyright © 2024 The Apache Software Foundation, Licensed under the Apache License, Version 2.0
    Apache Lucene.Net, Lucene.Net, Apache, the Apache feather logo, and the Apache Lucene.Net project logo are trademarks of The Apache Software Foundation.
    All other marks mentioned may be trademarks or registered trademarks of their respective owners.