• API

    Show / Hide Table of Contents

    Class StandardTokenizer

    A grammar-based tokenizer constructed with JFlex.

    As of Lucene version 3.1, this class implements the Word Break rules from the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29.

    Many applications have specific tokenizer needs. If this tokenizer does not suit your application, please consider copying this source code directory to your project and maintaining your own grammar-based tokenizer.

    You must specify the required Lucene.Net.Util.LuceneVersion compatibility when creating StandardTokenizer:

    • As of 3.4, Hiragana and Han characters are no longer wrongly split from their combining characters. If you use a previous version number, you get the exact broken behavior for backwards compatibility.
    • As of 3.1, StandardTokenizer implements Unicode text segmentation. If you use a previous version number, you get the exact behavior of ClassicTokenizer for backwards compatibility.

    Inheritance
    System.Object
    Lucene.Net.Util.AttributeSource
    Lucene.Net.Analysis.TokenStream
    Lucene.Net.Analysis.Tokenizer
    StandardTokenizer
    Implements
    System.IDisposable
    Inherited Members
    Lucene.Net.Analysis.Tokenizer.m_input
    Tokenizer.CorrectOffset(Int32)
    Tokenizer.SetReader(TextReader)
    Lucene.Net.Analysis.TokenStream.Dispose()
    Lucene.Net.Util.AttributeSource.GetAttributeFactory()
    Lucene.Net.Util.AttributeSource.GetAttributeClassesEnumerator()
    Lucene.Net.Util.AttributeSource.GetAttributeImplsEnumerator()
    Lucene.Net.Util.AttributeSource.AddAttributeImpl(Lucene.Net.Util.Attribute)
    Lucene.Net.Util.AttributeSource.AddAttribute<T>()
    Lucene.Net.Util.AttributeSource.HasAttributes
    Lucene.Net.Util.AttributeSource.HasAttribute<T>()
    Lucene.Net.Util.AttributeSource.GetAttribute<T>()
    Lucene.Net.Util.AttributeSource.ClearAttributes()
    Lucene.Net.Util.AttributeSource.CaptureState()
    Lucene.Net.Util.AttributeSource.RestoreState(Lucene.Net.Util.AttributeSource.State)
    Lucene.Net.Util.AttributeSource.GetHashCode()
    AttributeSource.Equals(Object)
    AttributeSource.ReflectAsString(Boolean)
    Lucene.Net.Util.AttributeSource.ReflectWith(Lucene.Net.Util.IAttributeReflector)
    Lucene.Net.Util.AttributeSource.CloneAttributes()
    Lucene.Net.Util.AttributeSource.CopyTo(Lucene.Net.Util.AttributeSource)
    Lucene.Net.Util.AttributeSource.ToString()
    System.Object.Equals(System.Object, System.Object)
    System.Object.GetType()
    System.Object.MemberwiseClone()
    System.Object.ReferenceEquals(System.Object, System.Object)
    Namespace: Lucene.Net.Analysis.Standard
    Assembly: Lucene.Net.Analysis.Common.dll
    Syntax
    public sealed class StandardTokenizer : Tokenizer, IDisposable

    Constructors

    | Improve this Doc View Source

    StandardTokenizer(LuceneVersion, AttributeSource.AttributeFactory, TextReader)

    Creates a new StandardTokenizer with a given Lucene.Net.Util.AttributeSource.AttributeFactory

    Declaration
    public StandardTokenizer(LuceneVersion matchVersion, AttributeSource.AttributeFactory factory, TextReader input)
    Parameters
    Type Name Description
    Lucene.Net.Util.LuceneVersion matchVersion
    Lucene.Net.Util.AttributeSource.AttributeFactory factory
    System.IO.TextReader input
    | Improve this Doc View Source

    StandardTokenizer(LuceneVersion, TextReader)

    Creates a new instance of the StandardTokenizer. Attaches the input to the newly created JFlex-generated (then ported to .NET) scanner.

    Declaration
    public StandardTokenizer(LuceneVersion matchVersion, TextReader input)
    Parameters
    Type Name Description
    Lucene.Net.Util.LuceneVersion matchVersion

    Lucene compatibility version - See StandardTokenizer

    System.IO.TextReader input

    The input reader

    See http://issues.apache.org/jira/browse/LUCENE-1068

    Fields

    | Improve this Doc View Source

    ACRONYM

    Declaration
    [Obsolete("(3.1)")]
    public const int ACRONYM = 2
    Field Value
    Type Description
    System.Int32
    | Improve this Doc View Source

    ACRONYM_DEP

    Declaration
    [Obsolete("(3.1)")]
    public const int ACRONYM_DEP = 8
    Field Value
    Type Description
    System.Int32
    | Improve this Doc View Source

    ALPHANUM

    Declaration
    public const int ALPHANUM = 0
    Field Value
    Type Description
    System.Int32
    | Improve this Doc View Source

    APOSTROPHE

    Declaration
    [Obsolete("(3.1)")]
    public const int APOSTROPHE = 1
    Field Value
    Type Description
    System.Int32
    | Improve this Doc View Source

    CJ

    Declaration
    [Obsolete("(3.1)")]
    public const int CJ = 7
    Field Value
    Type Description
    System.Int32
    | Improve this Doc View Source

    COMPANY

    Declaration
    [Obsolete("(3.1)")]
    public const int COMPANY = 3
    Field Value
    Type Description
    System.Int32
    | Improve this Doc View Source

    EMAIL

    Declaration
    public const int EMAIL = 4
    Field Value
    Type Description
    System.Int32
    | Improve this Doc View Source

    HANGUL

    Declaration
    public const int HANGUL = 13
    Field Value
    Type Description
    System.Int32
    | Improve this Doc View Source

    HIRAGANA

    Declaration
    public const int HIRAGANA = 11
    Field Value
    Type Description
    System.Int32
    | Improve this Doc View Source

    HOST

    Declaration
    [Obsolete("(3.1)")]
    public const int HOST = 5
    Field Value
    Type Description
    System.Int32
    | Improve this Doc View Source

    IDEOGRAPHIC

    Declaration
    public const int IDEOGRAPHIC = 10
    Field Value
    Type Description
    System.Int32
    | Improve this Doc View Source

    KATAKANA

    Declaration
    public const int KATAKANA = 12
    Field Value
    Type Description
    System.Int32
    | Improve this Doc View Source

    NUM

    Declaration
    public const int NUM = 6
    Field Value
    Type Description
    System.Int32
    | Improve this Doc View Source

    SOUTHEAST_ASIAN

    Declaration
    public const int SOUTHEAST_ASIAN = 9
    Field Value
    Type Description
    System.Int32
    | Improve this Doc View Source

    TOKEN_TYPES

    String token types that correspond to token type int constants

    Declaration
    public static readonly string[] TOKEN_TYPES
    Field Value
    Type Description
    System.String[]

    Properties

    | Improve this Doc View Source

    MaxTokenLength

    Set the max allowed token length. Any token longer than this is skipped.

    Declaration
    public int MaxTokenLength { get; set; }
    Property Value
    Type Description
    System.Int32

    Methods

    | Improve this Doc View Source

    Dispose(Boolean)

    Declaration
    protected override void Dispose(bool disposing)
    Parameters
    Type Name Description
    System.Boolean disposing
    Overrides
    Tokenizer.Dispose(Boolean)
    | Improve this Doc View Source

    End()

    Declaration
    public override sealed void End()
    Overrides
    Lucene.Net.Analysis.TokenStream.End()
    | Improve this Doc View Source

    IncrementToken()

    Declaration
    public override sealed bool IncrementToken()
    Returns
    Type Description
    System.Boolean
    Overrides
    Lucene.Net.Analysis.TokenStream.IncrementToken()
    | Improve this Doc View Source

    Reset()

    Declaration
    public override void Reset()
    Overrides
    Lucene.Net.Analysis.Tokenizer.Reset()

    Implements

    System.IDisposable
    • Improve this Doc
    • View Source
    Back to top Copyright © 2020 Licensed to the Apache Software Foundation (ASF)