Class StandardTokenizer
A grammar-based tokenizer constructed with JFlex.
As of Lucene version 3.1, this class implements the Word Break rules from the
Unicode Text Segmentation algorithm, as specified in
Unicode Standard Annex #29.
Many applications have specific tokenizer needs. If this tokenizer does
not suit your application, please consider copying this source code
directory to your project and maintaining your own grammar-based tokenizer.
You must specify the required LuceneVersion
compatibility when creating StandardTokenizer:
- As of 3.4, Hiragana and Han characters are no longer wrongly split
from their combining characters. If you use a previous version number,
you get the exact broken behavior for backwards compatibility.
- As of 3.1, StandardTokenizer implements Unicode text segmentation.
If you use a previous version number, you get the exact behavior of
ClassicTokenizer for backwards compatibility.
Inheritance
System.Object
StandardTokenizer
Assembly: Lucene.Net.Analysis.Common.dll
Syntax
public sealed class StandardTokenizer : Tokenizer, IDisposable
Constructors
|
Improve this Doc
View Source
StandardTokenizer(LuceneVersion, AttributeSource.AttributeFactory, TextReader)
Declaration
public StandardTokenizer(LuceneVersion matchVersion, AttributeSource.AttributeFactory factory, TextReader input)
Parameters
|
Improve this Doc
View Source
StandardTokenizer(LuceneVersion, TextReader)
Creates a new instance of the StandardTokenizer. Attaches
the input
to the newly created JFlex-generated (then ported to .NET) scanner.
Declaration
public StandardTokenizer(LuceneVersion matchVersion, TextReader input)
Parameters
Fields
|
Improve this Doc
View Source
ACRONYM
Declaration
public const int ACRONYM = null
Field Value
Type |
Description |
System.Int32 |
|
|
Improve this Doc
View Source
ACRONYM_DEP
Declaration
public const int ACRONYM_DEP = null
Field Value
Type |
Description |
System.Int32 |
|
|
Improve this Doc
View Source
ALPHANUM
Declaration
public const int ALPHANUM = null
Field Value
Type |
Description |
System.Int32 |
|
|
Improve this Doc
View Source
APOSTROPHE
Declaration
public const int APOSTROPHE = null
Field Value
Type |
Description |
System.Int32 |
|
|
Improve this Doc
View Source
CJ
Declaration
public const int CJ = null
Field Value
Type |
Description |
System.Int32 |
|
|
Improve this Doc
View Source
COMPANY
Declaration
public const int COMPANY = null
Field Value
Type |
Description |
System.Int32 |
|
|
Improve this Doc
View Source
EMAIL
Declaration
public const int EMAIL = null
Field Value
Type |
Description |
System.Int32 |
|
|
Improve this Doc
View Source
HANGUL
Declaration
public const int HANGUL = null
Field Value
Type |
Description |
System.Int32 |
|
|
Improve this Doc
View Source
HIRAGANA
Declaration
public const int HIRAGANA = null
Field Value
Type |
Description |
System.Int32 |
|
|
Improve this Doc
View Source
HOST
Declaration
public const int HOST = null
Field Value
Type |
Description |
System.Int32 |
|
|
Improve this Doc
View Source
IDEOGRAPHIC
Declaration
public const int IDEOGRAPHIC = null
Field Value
Type |
Description |
System.Int32 |
|
|
Improve this Doc
View Source
KATAKANA
Declaration
public const int KATAKANA = null
Field Value
Type |
Description |
System.Int32 |
|
|
Improve this Doc
View Source
NUM
Declaration
public const int NUM = null
Field Value
Type |
Description |
System.Int32 |
|
|
Improve this Doc
View Source
SOUTHEAST_ASIAN
Declaration
public const int SOUTHEAST_ASIAN = null
Field Value
Type |
Description |
System.Int32 |
|
|
Improve this Doc
View Source
TOKEN_TYPES
String token types that correspond to token type int constants
Declaration
public static readonly string[] TOKEN_TYPES
Field Value
Type |
Description |
System.String[] |
|
Properties
|
Improve this Doc
View Source
MaxTokenLength
Set the max allowed token length. Any token longer
than this is skipped.
Declaration
public int MaxTokenLength { get; set; }
Property Value
Type |
Description |
System.Int32 |
|
Methods
|
Improve this Doc
View Source
Dispose(Boolean)
Declaration
protected override void Dispose(bool disposing)
Parameters
Type |
Name |
Description |
System.Boolean |
disposing |
|
|
Improve this Doc
View Source
End()
Declaration
public override sealed void End()
Overrides
|
Improve this Doc
View Source
IncrementToken()
Declaration
public override sealed bool IncrementToken()
Returns
Type |
Description |
System.Boolean |
|
Overrides
|
Improve this Doc
View Source
Reset()
Declaration
public override void Reset()
Overrides
Implements
IDisposable