Class Tokenizer
A Tokenizer is a TokenStream whose input is a TextReader.
This is an abstract class; subclasses must override IncrementToken() NOTE: Subclasses overriding IncrementToken() must call ClearAttributes() before setting attributes.Implements
Inherited Members
Namespace: Lucene.Net.Analysis
Assembly: Lucene.Net.dll
Syntax
public abstract class Tokenizer : TokenStream, IDisposable
Constructors
Tokenizer(AttributeFactory, TextReader)
Construct a token stream processing the given input using the given AttributeSource.AttributeFactory.
Declaration
protected Tokenizer(AttributeSource.AttributeFactory factory, TextReader input)
Parameters
Type | Name | Description |
---|---|---|
AttributeSource.AttributeFactory | factory | |
TextReader | input |
Tokenizer(TextReader)
Construct a token stream processing the given input.
Declaration
protected Tokenizer(TextReader input)
Parameters
Type | Name | Description |
---|---|---|
TextReader | input |
Fields
m_input
The text source for this Tokenizer.
Declaration
protected TextReader m_input
Field Value
Type | Description |
---|---|
TextReader |
Methods
CorrectOffset(int)
Return the corrected offset. If m_input is a CharFilter subclass
this method calls CorrectOffset(int), else returns currentOff
.
Declaration
protected int CorrectOffset(int currentOff)
Parameters
Type | Name | Description |
---|---|---|
int | currentOff | offset as seen in the output |
Returns
Type | Description |
---|---|
int | corrected offset based on the input |
See Also
Dispose(bool)
Releases resources associated with this stream.
If you override this method, always callbase.Dispose(disposing)
, otherwise
some internal state will not be correctly reset (e.g., Tokenizer will
throw InvalidOperationException on reuse).
Declaration
protected override void Dispose(bool disposing)
Parameters
Type | Name | Description |
---|---|---|
bool | disposing |
Overrides
Remarks
NOTE:
The default implementation closes the input TextReader, so
be sure to call base.Dispose(disposing)
when overriding this method.
Reset()
This method is called by a consumer before it begins consumption using IncrementToken().
Resets this stream to a clean state. Stateful implementations must implement this method so that they can be reused, just as if they had been created fresh. If you override this method, always callbase.Reset()
, otherwise
some internal state will not be correctly reset (e.g., Tokenizer will
throw InvalidOperationException on further usage).
Declaration
public override void Reset()
Overrides
SetReader(TextReader)
Expert: Set a new reader on the Tokenizer. Typically, an analyzer (in its tokenStream method) will use this to re-use a previously created tokenizer.
Declaration
public void SetReader(TextReader input)
Parameters
Type | Name | Description |
---|---|---|
TextReader | input |