Show / Hide Table of Contents

    Class StandardTokenizer

    A grammar-based tokenizer constructed with JFlex.

    As of Lucene version 3.1, this class implements the Word Break rules from the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29.

    Many applications have specific tokenizer needs. If this tokenizer does not suit your application, please consider copying this source code directory to your project and maintaining your own grammar-based tokenizer.

    You must specify the required LuceneVersion compatibility when creating StandardTokenizer:

    • As of 3.4, Hiragana and Han characters are no longer wrongly split from their combining characters. If you use a previous version number, you get the exact broken behavior for backwards compatibility.
    • As of 3.1, StandardTokenizer implements Unicode text segmentation. If you use a previous version number, you get the exact behavior of ClassicTokenizer for backwards compatibility.

    Inheritance
    System.Object
    AttributeSource
    TokenStream
    Tokenizer
    StandardTokenizer
    Implements
    IDisposable
    Inherited Members
    Tokenizer.m_input
    Tokenizer.CorrectOffset(Int32)
    Tokenizer.SetReader(TextReader)
    TokenStream.Dispose()
    AttributeSource.GetAttributeFactory()
    AttributeSource.GetAttributeClassesEnumerator()
    AttributeSource.GetAttributeImplsEnumerator()
    AttributeSource.AddAttributeImpl(Attribute)
    AttributeSource.AddAttribute<T>()
    AttributeSource.HasAttributes
    AttributeSource.HasAttribute<T>()
    AttributeSource.GetAttribute<T>()
    AttributeSource.ClearAttributes()
    AttributeSource.CaptureState()
    AttributeSource.RestoreState(AttributeSource.State)
    AttributeSource.GetHashCode()
    AttributeSource.Equals(Object)
    AttributeSource.ReflectAsString(Boolean)
    AttributeSource.ReflectWith(IAttributeReflector)
    AttributeSource.CloneAttributes()
    AttributeSource.CopyTo(AttributeSource)
    AttributeSource.ToString()
    Namespace: Lucene.Net.Analysis.Standard
    Assembly: Lucene.Net.Analysis.Common.dll
    Syntax
    public sealed class StandardTokenizer : Tokenizer, IDisposable

    Constructors

    | Improve this Doc View Source

    StandardTokenizer(LuceneVersion, AttributeSource.AttributeFactory, TextReader)

    Creates a new StandardTokenizer with a given AttributeSource.AttributeFactory

    Declaration
    public StandardTokenizer(LuceneVersion matchVersion, AttributeSource.AttributeFactory factory, TextReader input)
    Parameters
    Type Name Description
    LuceneVersion matchVersion
    AttributeSource.AttributeFactory factory
    TextReader input
    | Improve this Doc View Source

    StandardTokenizer(LuceneVersion, TextReader)

    Creates a new instance of the StandardTokenizer. Attaches the input to the newly created JFlex-generated (then ported to .NET) scanner.

    Declaration
    public StandardTokenizer(LuceneVersion matchVersion, TextReader input)
    Parameters
    Type Name Description
    LuceneVersion matchVersion

    Lucene compatibility version - See StandardTokenizer

    TextReader input

    The input reader

    See http://issues.apache.org/jira/browse/LUCENE-1068

    Fields

    | Improve this Doc View Source

    ACRONYM

    Declaration
    public const int ACRONYM = null
    Field Value
    Type Description
    System.Int32
    | Improve this Doc View Source

    ACRONYM_DEP

    Declaration
    public const int ACRONYM_DEP = null
    Field Value
    Type Description
    System.Int32
    | Improve this Doc View Source

    ALPHANUM

    Declaration
    public const int ALPHANUM = null
    Field Value
    Type Description
    System.Int32
    | Improve this Doc View Source

    APOSTROPHE

    Declaration
    public const int APOSTROPHE = null
    Field Value
    Type Description
    System.Int32
    | Improve this Doc View Source

    CJ

    Declaration
    public const int CJ = null
    Field Value
    Type Description
    System.Int32
    | Improve this Doc View Source

    COMPANY

    Declaration
    public const int COMPANY = null
    Field Value
    Type Description
    System.Int32
    | Improve this Doc View Source

    EMAIL

    Declaration
    public const int EMAIL = null
    Field Value
    Type Description
    System.Int32
    | Improve this Doc View Source

    HANGUL

    Declaration
    public const int HANGUL = null
    Field Value
    Type Description
    System.Int32
    | Improve this Doc View Source

    HIRAGANA

    Declaration
    public const int HIRAGANA = null
    Field Value
    Type Description
    System.Int32
    | Improve this Doc View Source

    HOST

    Declaration
    public const int HOST = null
    Field Value
    Type Description
    System.Int32
    | Improve this Doc View Source

    IDEOGRAPHIC

    Declaration
    public const int IDEOGRAPHIC = null
    Field Value
    Type Description
    System.Int32
    | Improve this Doc View Source

    KATAKANA

    Declaration
    public const int KATAKANA = null
    Field Value
    Type Description
    System.Int32
    | Improve this Doc View Source

    NUM

    Declaration
    public const int NUM = null
    Field Value
    Type Description
    System.Int32
    | Improve this Doc View Source

    SOUTHEAST_ASIAN

    Declaration
    public const int SOUTHEAST_ASIAN = null
    Field Value
    Type Description
    System.Int32
    | Improve this Doc View Source

    TOKEN_TYPES

    String token types that correspond to token type int constants

    Declaration
    public static readonly string[] TOKEN_TYPES
    Field Value
    Type Description
    System.String[]

    Properties

    | Improve this Doc View Source

    MaxTokenLength

    Set the max allowed token length. Any token longer than this is skipped.

    Declaration
    public int MaxTokenLength { get; set; }
    Property Value
    Type Description
    System.Int32

    Methods

    | Improve this Doc View Source

    Dispose(Boolean)

    Declaration
    protected override void Dispose(bool disposing)
    Parameters
    Type Name Description
    System.Boolean disposing
    | Improve this Doc View Source

    End()

    Declaration
    public override sealed void End()
    Overrides
    TokenStream.End()
    | Improve this Doc View Source

    IncrementToken()

    Declaration
    public override sealed bool IncrementToken()
    Returns
    Type Description
    System.Boolean
    Overrides
    TokenStream.IncrementToken()
    | Improve this Doc View Source

    Reset()

    Declaration
    public override void Reset()
    Overrides
    Tokenizer.Reset()

    Implements

    IDisposable
    • Improve this Doc
    • View Source
    Back to top Copyright © 2020 Licensed to the Apache Software Foundation (ASF)