A
CopyC#
TokenStream
enumerates the sequence of tokens, either from {@link Field}s of a {@link Document} or from query text.

This is an abstract class. Concrete subclasses are:

  • {@link Tokenizer}, a
    CopyC#
    TokenStream
    whose input is a Reader; and
  • {@link TokenFilter}, a
    CopyC#
    TokenStream
    whose input is another
    CopyC#
    TokenStream
    .
A new
CopyC#
TokenStream
API has been introduced with Lucene 2.9. This API has moved from being {@link Token} based to {@link Attribute} based. While {@link Token} still exists in 2.9 as a convenience class, the preferred way to store the information of a {@link Token} is to use {@link AttributeImpl}s.

CopyC#
TokenStream
now extends {@link AttributeSource}, which provides access to all of the token {@link Attribute}s for the
CopyC#
TokenStream
. Note that only one instance per {@link AttributeImpl} is created and reused for every token. This approach reduces object creation and allows local caching of references to the {@link AttributeImpl}s. See {@link #IncrementToken()} for further details.

The workflow of the new

CopyC#
TokenStream
API is as follows:
  1. Instantiation of
    CopyC#
    TokenStream
    /{@link TokenFilter}s which add/get attributes to/from the {@link AttributeSource}.
  2. The consumer calls {@link TokenStream#Reset()}.
  3. The consumer retrieves attributes from the stream and stores local references to all attributes it wants to access
  4. The consumer calls {@link #IncrementToken()} until it returns false and consumes the attributes after each call.
  5. The consumer calls {@link #End()} so that any end-of-stream operations can be performed.
  6. The consumer calls {@link #Close()} to release any resource when finished using the
    CopyC#
    TokenStream
To make sure that filters and consumers know which attributes are available, the attributes must be added during instantiation. Filters and consumers are not required to check for availability of attributes in {@link #IncrementToken()}.

You can find some example code for the new API in the analysis package level Javadoc.

Sometimes it is desirable to capture a current state of a

CopyC#
TokenStream
, e. g. for buffering purposes (see {@link CachingTokenFilter}, {@link TeeSinkTokenFilter}). For this usecase {@link AttributeSource#CaptureState} and {@link AttributeSource#RestoreState} can be used.

Namespace: Lucene.Net.Analysis
Assembly: Lucene.Net (in Lucene.Net.dll) Version: 2.9.4.1

Syntax

C#
public abstract class TokenStream : AttributeSource
Visual Basic
Public MustInherit Class TokenStream _
	Inherits AttributeSource
Visual C++
public ref class TokenStream abstract : public AttributeSource

Inheritance Hierarchy

See Also