Show / Hide Table of Contents

    Class Tokenizer

    A Tokenizer is a TokenStream whose input is a .

    This is an abstract class; subclasses must override IncrementToken()

    NOTE: Subclasses overriding IncrementToken() must call ClearAttributes() before setting attributes.

    Inheritance
    System.Object
    AttributeSource
    TokenStream
    Tokenizer
    Implements
    IDisposable
    Inherited Members
    TokenStream.IncrementToken()
    TokenStream.End()
    TokenStream.Dispose()
    AttributeSource.GetAttributeFactory()
    AttributeSource.GetAttributeClassesEnumerator()
    AttributeSource.GetAttributeImplsEnumerator()
    AttributeSource.AddAttributeImpl(Attribute)
    AttributeSource.AddAttribute<T>()
    AttributeSource.HasAttributes
    AttributeSource.HasAttribute<T>()
    AttributeSource.GetAttribute<T>()
    AttributeSource.ClearAttributes()
    AttributeSource.CaptureState()
    AttributeSource.RestoreState(AttributeSource.State)
    AttributeSource.GetHashCode()
    AttributeSource.Equals(Object)
    AttributeSource.ReflectAsString(Boolean)
    AttributeSource.ReflectWith(IAttributeReflector)
    AttributeSource.CloneAttributes()
    AttributeSource.CopyTo(AttributeSource)
    AttributeSource.ToString()
    Namespace: Lucene.Net.Analysis
    Assembly: Lucene.Net.dll
    Syntax
    public abstract class Tokenizer : TokenStream, IDisposable

    Constructors

    | Improve this Doc View Source

    Tokenizer(AttributeSource.AttributeFactory, TextReader)

    Construct a token stream processing the given input using the given AttributeSource.AttributeFactory.

    Declaration
    protected Tokenizer(AttributeSource.AttributeFactory factory, TextReader input)
    Parameters
    Type Name Description
    AttributeSource.AttributeFactory factory
    TextReader input
    | Improve this Doc View Source

    Tokenizer(TextReader)

    Construct a token stream processing the given input.

    Declaration
    protected Tokenizer(TextReader input)
    Parameters
    Type Name Description
    TextReader input

    Fields

    | Improve this Doc View Source

    m_input

    The text source for this Tokenizer.

    Declaration
    protected TextReader m_input
    Field Value
    Type Description
    TextReader

    Methods

    | Improve this Doc View Source

    CorrectOffset(Int32)

    Return the corrected offset. If m_input is a CharFilter subclass this method calls CorrectOffset(Int32), else returns currentOff.

    Declaration
    protected int CorrectOffset(int currentOff)
    Parameters
    Type Name Description
    System.Int32 currentOff

    offset as seen in the output

    Returns
    Type Description
    System.Int32

    corrected offset based on the input

    See Also
    CorrectOffset(Int32)
    | Improve this Doc View Source

    Dispose(Boolean)

    Releases resources associated with this stream.

    If you override this method, always call base.Dispose(disposing), otherwise some internal state will not be correctly reset (e.g., Tokenizer will throw on reuse).

    Declaration
    protected override void Dispose(bool disposing)
    Parameters
    Type Name Description
    System.Boolean disposing
    Overrides
    TokenStream.Dispose(Boolean)
    Remarks

    NOTE: The default implementation closes the input , so be sure to call base.Dispose(disposing) when overriding this method.

    | Improve this Doc View Source

    Reset()

    Declaration
    public override void Reset()
    Overrides
    TokenStream.Reset()
    | Improve this Doc View Source

    SetReader(TextReader)

    Expert: Set a new reader on the Tokenizer. Typically, an analyzer (in its tokenStream method) will use this to re-use a previously created tokenizer.

    Declaration
    public void SetReader(TextReader input)
    Parameters
    Type Name Description
    TextReader input

    Implements

    IDisposable
    • Improve this Doc
    • View Source
    Back to top Copyright © 2020 Licensed to the Apache Software Foundation (ASF)