Class StandardTokenizer
A grammar-based tokenizer constructed with JFlex.
As of Lucene version 3.1, this class implements the Word Break rules from the
Unicode Text Segmentation algorithm, as specified in
Unicode Standard Annex #29.
Many applications have specific tokenizer needs. If this tokenizer does
not suit your application, please consider copying this source code
directory to your project and maintaining your own grammar-based tokenizer.
You must specify the required Lucene.Net.Util.LuceneVersion
compatibility when creating StandardTokenizer:
- As of 3.4, Hiragana and Han characters are no longer wrongly split
from their combining characters. If you use a previous version number,
you get the exact broken behavior for backwards compatibility.
- As of 3.1, StandardTokenizer implements Unicode text segmentation.
If you use a previous version number, you get the exact behavior of
ClassicTokenizer for backwards compatibility.
Inheritance
System.Object
Lucene.Net.Util.AttributeSource
Lucene.Net.Analysis.TokenStream
Lucene.Net.Analysis.Tokenizer
StandardTokenizer
Implements
System.IDisposable
Inherited Members
Lucene.Net.Analysis.Tokenizer.m_input
Lucene.Net.Analysis.TokenStream.Dispose()
Lucene.Net.Util.AttributeSource.GetAttributeFactory()
Lucene.Net.Util.AttributeSource.GetAttributeClassesEnumerator()
Lucene.Net.Util.AttributeSource.GetAttributeImplsEnumerator()
Lucene.Net.Util.AttributeSource.AddAttributeImpl(Lucene.Net.Util.Attribute)
Lucene.Net.Util.AttributeSource.AddAttribute<T>()
Lucene.Net.Util.AttributeSource.HasAttributes
Lucene.Net.Util.AttributeSource.HasAttribute<T>()
Lucene.Net.Util.AttributeSource.GetAttribute<T>()
Lucene.Net.Util.AttributeSource.ClearAttributes()
Lucene.Net.Util.AttributeSource.CaptureState()
Lucene.Net.Util.AttributeSource.RestoreState(Lucene.Net.Util.AttributeSource.State)
Lucene.Net.Util.AttributeSource.GetHashCode()
Lucene.Net.Util.AttributeSource.ReflectWith(Lucene.Net.Util.IAttributeReflector)
Lucene.Net.Util.AttributeSource.CloneAttributes()
Lucene.Net.Util.AttributeSource.CopyTo(Lucene.Net.Util.AttributeSource)
Lucene.Net.Util.AttributeSource.ToString()
System.Object.Equals(System.Object, System.Object)
System.Object.GetType()
System.Object.MemberwiseClone()
System.Object.ReferenceEquals(System.Object, System.Object)
Assembly: Lucene.Net.Analysis.Common.dll
Syntax
public sealed class StandardTokenizer : Tokenizer, IDisposable
Constructors
|
Improve this Doc
View Source
StandardTokenizer(LuceneVersion, AttributeSource.AttributeFactory, TextReader)
Creates a new StandardTokenizer with a given Lucene.Net.Util.AttributeSource.AttributeFactory
Declaration
public StandardTokenizer(LuceneVersion matchVersion, AttributeSource.AttributeFactory factory, TextReader input)
Parameters
Type |
Name |
Description |
Lucene.Net.Util.LuceneVersion |
matchVersion |
|
Lucene.Net.Util.AttributeSource.AttributeFactory |
factory |
|
System.IO.TextReader |
input |
|
|
Improve this Doc
View Source
StandardTokenizer(LuceneVersion, TextReader)
Creates a new instance of the StandardTokenizer. Attaches
the input
to the newly created JFlex-generated (then ported to .NET) scanner.
Declaration
public StandardTokenizer(LuceneVersion matchVersion, TextReader input)
Parameters
Fields
|
Improve this Doc
View Source
ACRONYM
Declaration
[Obsolete("(3.1)")]
public const int ACRONYM = 2
Field Value
Type |
Description |
System.Int32 |
|
|
Improve this Doc
View Source
ACRONYM_DEP
Declaration
[Obsolete("(3.1)")]
public const int ACRONYM_DEP = 8
Field Value
Type |
Description |
System.Int32 |
|
|
Improve this Doc
View Source
ALPHANUM
Declaration
public const int ALPHANUM = 0
Field Value
Type |
Description |
System.Int32 |
|
|
Improve this Doc
View Source
APOSTROPHE
Declaration
[Obsolete("(3.1)")]
public const int APOSTROPHE = 1
Field Value
Type |
Description |
System.Int32 |
|
|
Improve this Doc
View Source
CJ
Declaration
[Obsolete("(3.1)")]
public const int CJ = 7
Field Value
Type |
Description |
System.Int32 |
|
|
Improve this Doc
View Source
COMPANY
Declaration
[Obsolete("(3.1)")]
public const int COMPANY = 3
Field Value
Type |
Description |
System.Int32 |
|
|
Improve this Doc
View Source
EMAIL
Declaration
public const int EMAIL = 4
Field Value
Type |
Description |
System.Int32 |
|
|
Improve this Doc
View Source
HANGUL
Declaration
public const int HANGUL = 13
Field Value
Type |
Description |
System.Int32 |
|
|
Improve this Doc
View Source
HIRAGANA
Declaration
public const int HIRAGANA = 11
Field Value
Type |
Description |
System.Int32 |
|
|
Improve this Doc
View Source
HOST
Declaration
[Obsolete("(3.1)")]
public const int HOST = 5
Field Value
Type |
Description |
System.Int32 |
|
|
Improve this Doc
View Source
IDEOGRAPHIC
Declaration
public const int IDEOGRAPHIC = 10
Field Value
Type |
Description |
System.Int32 |
|
|
Improve this Doc
View Source
KATAKANA
Declaration
public const int KATAKANA = 12
Field Value
Type |
Description |
System.Int32 |
|
|
Improve this Doc
View Source
NUM
Declaration
Field Value
Type |
Description |
System.Int32 |
|
|
Improve this Doc
View Source
SOUTHEAST_ASIAN
Declaration
public const int SOUTHEAST_ASIAN = 9
Field Value
Type |
Description |
System.Int32 |
|
|
Improve this Doc
View Source
TOKEN_TYPES
String token types that correspond to token type int constants
Declaration
public static readonly string[] TOKEN_TYPES
Field Value
Type |
Description |
System.String[] |
|
Properties
|
Improve this Doc
View Source
MaxTokenLength
Set the max allowed token length. Any token longer
than this is skipped.
Declaration
public int MaxTokenLength { get; set; }
Property Value
Type |
Description |
System.Int32 |
|
Methods
|
Improve this Doc
View Source
Dispose(Boolean)
Declaration
protected override void Dispose(bool disposing)
Parameters
Type |
Name |
Description |
System.Boolean |
disposing |
|
Overrides
|
Improve this Doc
View Source
End()
Declaration
public override sealed void End()
Overrides
Lucene.Net.Analysis.TokenStream.End()
|
Improve this Doc
View Source
IncrementToken()
Declaration
public override sealed bool IncrementToken()
Returns
Type |
Description |
System.Boolean |
|
Overrides
Lucene.Net.Analysis.TokenStream.IncrementToken()
|
Improve this Doc
View Source
Reset()
Declaration
public override void Reset()
Overrides
Lucene.Net.Analysis.Tokenizer.Reset()
Implements
System.IDisposable