Class LetterTokenizer
A LetterTokenizer is a tokenizer that divides text at non-letters. That's to
say, it defines tokens as maximal strings of adjacent letters, as defined by
Note: this does a decent job for most European languages, but does a terrible job for some Asian languages, where words are not separated by spaces.
You must specify the required LuceneVersion compatibility when creating LetterTokenizer:
- As of 3.1, CharTokenizer uses an
based API to normalize and detect token characters. See IsTokenChar(Int32) and Normalize(Int32) for details.
Inheritance
System.Object
LetterTokenizer
Implements
IDisposable
Inherited Members
Namespace: Lucene.Net.Analysis.Core
Assembly: Lucene.Net.Analysis.Common.dll
Syntax
public class LetterTokenizer : CharTokenizer, IDisposable
Constructors
| Improve this Doc View SourceLetterTokenizer(LuceneVersion, AttributeSource.AttributeFactory, TextReader)
Construct a new LetterTokenizer using a given AttributeSource.AttributeFactory.
Declaration
public LetterTokenizer(LuceneVersion matchVersion, AttributeSource.AttributeFactory factory, TextReader in)
Parameters
Type | Name | Description |
---|---|---|
LuceneVersion | matchVersion | LuceneVersion to match |
AttributeSource.AttributeFactory | factory | the attribute factory to use for this Tokenizer |
TextReader | in | the input to split up into tokens |
LetterTokenizer(LuceneVersion, TextReader)
Construct a new LetterTokenizer.
Declaration
public LetterTokenizer(LuceneVersion matchVersion, TextReader in)
Parameters
Type | Name | Description |
---|---|---|
LuceneVersion | matchVersion | LuceneVersion to match. |
TextReader | in | the input to split up into tokens |
Methods
| Improve this Doc View SourceIsTokenChar(Int32)
Collects only characters which satisfy
Declaration
protected override bool IsTokenChar(int c)
Parameters
Type | Name | Description |
---|---|---|
System.Int32 | c |
Returns
Type | Description |
---|---|
System.Boolean |
Overrides
Implements
IDisposable