Show / Hide Table of Contents

    Class PatternTokenizer

    This tokenizer uses regex pattern matching to construct distinct tokens for the input stream. It takes two arguments: "pattern" and "group".

    • "pattern" is the regular expression.
    • "group" says which group to extract into tokens.

    group=-1 (the default) is equivalent to "split". In this case, the tokens will be equivalent to the output from (without empty tokens):

    Using group >= 0 selects the matching group as the token. For example, if you have:

     pattern = \'([^\']+)\'
     group = 0
     input = aaa 'bbb' 'ccc'

    the output will be two tokens: 'bbb' and 'ccc' (including the ' marks). With the same input but using group=1, the output would be: bbb and ccc (no ' marks)

    NOTE: This Tokenizer does not output tokens that are of zero length.

    Inheritance
    System.Object
    AttributeSource
    TokenStream
    Tokenizer
    PatternTokenizer
    Implements
    IDisposable
    Inherited Members
    Tokenizer.m_input
    Tokenizer.Dispose(Boolean)
    Tokenizer.CorrectOffset(Int32)
    Tokenizer.SetReader(TextReader)
    TokenStream.Dispose()
    AttributeSource.GetAttributeFactory()
    AttributeSource.GetAttributeClassesEnumerator()
    AttributeSource.GetAttributeImplsEnumerator()
    AttributeSource.AddAttributeImpl(Attribute)
    AttributeSource.AddAttribute<T>()
    AttributeSource.HasAttributes
    AttributeSource.HasAttribute<T>()
    AttributeSource.GetAttribute<T>()
    AttributeSource.ClearAttributes()
    AttributeSource.CaptureState()
    AttributeSource.RestoreState(AttributeSource.State)
    AttributeSource.GetHashCode()
    AttributeSource.Equals(Object)
    AttributeSource.ReflectAsString(Boolean)
    AttributeSource.ReflectWith(IAttributeReflector)
    AttributeSource.CloneAttributes()
    AttributeSource.CopyTo(AttributeSource)
    AttributeSource.ToString()
    Namespace: Lucene.Net.Analysis.Pattern
    Assembly: Lucene.Net.Analysis.Common.dll
    Syntax
    public sealed class PatternTokenizer : Tokenizer, IDisposable

    Constructors

    | Improve this Doc View Source

    PatternTokenizer(AttributeSource.AttributeFactory, TextReader, Regex, Int32)

    creates a new PatternTokenizer returning tokens from group (-1 for split functionality)

    Declaration
    public PatternTokenizer(AttributeSource.AttributeFactory factory, TextReader input, Regex pattern, int group)
    Parameters
    Type Name Description
    AttributeSource.AttributeFactory factory
    TextReader input
    Regex pattern
    System.Int32 group
    | Improve this Doc View Source

    PatternTokenizer(TextReader, Regex, Int32)

    creates a new PatternTokenizer returning tokens from group (-1 for split functionality)

    Declaration
    public PatternTokenizer(TextReader input, Regex pattern, int group)
    Parameters
    Type Name Description
    TextReader input
    Regex pattern
    System.Int32 group

    Methods

    | Improve this Doc View Source

    End()

    Declaration
    public override void End()
    Overrides
    TokenStream.End()
    | Improve this Doc View Source

    IncrementToken()

    Declaration
    public override bool IncrementToken()
    Returns
    Type Description
    System.Boolean
    Overrides
    TokenStream.IncrementToken()
    | Improve this Doc View Source

    Reset()

    Declaration
    public override void Reset()
    Overrides
    Tokenizer.Reset()

    Implements

    IDisposable
    • Improve this Doc
    • View Source
    Back to top Copyright © 2020 Licensed to the Apache Software Foundation (ASF)