A LetterTokenizer is a tokenizer that divides text at non-letters. That's
to say, it defines tokens as maximal strings of adjacent letters, as defined
by java.lang.Character.isLetter() predicate.
Note: this does a decent job for most European languages, but does a terrible
job for some Asian languages, where words are not separated by spaces.
Namespace: Lucene.Net.AnalysisAssembly: Lucene.Net (in Lucene.Net.dll) Version: 2.9.4.1
Syntax
C# |
---|
public class LetterTokenizer : CharTokenizer |
Visual Basic |
---|
Public Class LetterTokenizer _ Inherits CharTokenizer |
Visual C++ |
---|
public ref class LetterTokenizer : public CharTokenizer |
Inheritance Hierarchy
System..::..Object
Lucene.Net.Util..::..AttributeSource
Lucene.Net.Analysis..::..TokenStream
Lucene.Net.Analysis..::..Tokenizer
Lucene.Net.Analysis..::..CharTokenizer
Lucene.Net.Analysis..::..LetterTokenizer
Lucene.Net.Analysis.AR..::..ArabicLetterTokenizer
Lucene.Net.Analysis..::..LowerCaseTokenizer
Lucene.Net.Util..::..AttributeSource
Lucene.Net.Analysis..::..TokenStream
Lucene.Net.Analysis..::..Tokenizer
Lucene.Net.Analysis..::..CharTokenizer
Lucene.Net.Analysis..::..LetterTokenizer
Lucene.Net.Analysis.AR..::..ArabicLetterTokenizer
Lucene.Net.Analysis..::..LowerCaseTokenizer