Class CharTokenizer
An abstract base class for simple, character-oriented tokenizers.
You must specify the required Lucene
- As of 3.1, Char
Tokenizer uses an int based API to normalize and detect token codepoints. See IsToken and Normalize(Int32) for details.Char(Int32)
A new Char
As of Lucene 3.1 each Char
Note: If you use a subclass of Char
Inheritance
Implements
Inherited Members
Namespace: Lucene.Net.Analysis.Util
Assembly: Lucene.Net.Analysis.Common.dll
Syntax
public abstract class CharTokenizer : Tokenizer, IDisposable
Constructors
| Improve this Doc View SourceCharTokenizer(LuceneVersion, AttributeSource.AttributeFactory, TextReader)
Creates a new Char
Declaration
public CharTokenizer(LuceneVersion matchVersion, AttributeSource.AttributeFactory factory, TextReader input)
Parameters
Type | Name | Description |
---|---|---|
Lucene |
matchVersion | Lucene version to match |
Attribute |
factory | the attribute factory to use for this Tokenizer |
Text |
input | the input to split up into tokens |
CharTokenizer(LuceneVersion, TextReader)
Creates a new Char
Declaration
public CharTokenizer(LuceneVersion matchVersion, TextReader input)
Parameters
Type | Name | Description |
---|---|---|
Lucene |
matchVersion | Lucene version to match |
Text |
input | the input to split up into tokens |
Methods
| Improve this Doc View SourceEnd()
Declaration
public override sealed void End()
Overrides
| Improve this Doc View SourceIncrementToken()
Declaration
public override sealed bool IncrementToken()
Returns
Type | Description |
---|---|
System. |
Overrides
| Improve this Doc View SourceIsTokenChar(Int32)
Returns true iff a codepoint should be included in a token. This tokenizer generates as tokens adjacent sequences of codepoints which satisfy this predicate. Codepoints for which this is false are used to define token boundaries and are not included in tokens.
Declaration
protected abstract bool IsTokenChar(int c)
Parameters
Type | Name | Description |
---|---|---|
System. |
c |
Returns
Type | Description |
---|---|
System. |
Normalize(Int32)
Called on each token character to normalize it before it is added to the token. The default implementation does nothing. Subclasses may use this to, e.g., lowercase tokens.
Declaration
protected virtual int Normalize(int c)
Parameters
Type | Name | Description |
---|---|---|
System. |
c |
Returns
Type | Description |
---|---|
System. |
Reset()
Declaration
public override void Reset()