Fork me on GitHub
  • API

    Show / Hide Table of Contents

    Class PatternTokenizer

    This tokenizer uses regex pattern matching to construct distinct tokens for the input stream. It takes two arguments: "pattern" and "group".

    • "pattern" is the regular expression.
    • "group" says which group to extract into tokens.

    group=-1 (the default) is equivalent to "split". In this case, the tokens will be equivalent to the output from (without empty tokens): System.Text.RegularExpressions.Regex.Replace(System.String,System.String)

    Using group >= 0 selects the matching group as the token. For example, if you have:

     pattern = \'([^\']+)\'
     group = 0
     input = aaa 'bbb' 'ccc'

    the output will be two tokens: 'bbb' and 'ccc' (including the ' marks). With the same input but using group=1, the output would be: bbb and ccc (no ' marks)

    NOTE: This Lucene.Net.Analysis.Tokenizer does not output tokens that are of zero length.

    Inheritance
    System.Object
    Lucene.Net.Util.AttributeSource
    Lucene.Net.Analysis.TokenStream
    Lucene.Net.Analysis.Tokenizer
    PatternTokenizer
    Implements
    System.IDisposable
    Inherited Members
    Lucene.Net.Analysis.Tokenizer.m_input
    Tokenizer.Dispose(Boolean)
    Tokenizer.CorrectOffset(Int32)
    Tokenizer.SetReader(TextReader)
    Lucene.Net.Analysis.TokenStream.Dispose()
    Lucene.Net.Util.AttributeSource.GetAttributeFactory()
    Lucene.Net.Util.AttributeSource.GetAttributeClassesEnumerator()
    Lucene.Net.Util.AttributeSource.GetAttributeImplsEnumerator()
    Lucene.Net.Util.AttributeSource.AddAttributeImpl(Lucene.Net.Util.Attribute)
    Lucene.Net.Util.AttributeSource.AddAttribute<T>()
    Lucene.Net.Util.AttributeSource.HasAttributes
    Lucene.Net.Util.AttributeSource.HasAttribute<T>()
    Lucene.Net.Util.AttributeSource.GetAttribute<T>()
    Lucene.Net.Util.AttributeSource.ClearAttributes()
    Lucene.Net.Util.AttributeSource.CaptureState()
    Lucene.Net.Util.AttributeSource.RestoreState(Lucene.Net.Util.AttributeSource.State)
    Lucene.Net.Util.AttributeSource.GetHashCode()
    AttributeSource.Equals(Object)
    AttributeSource.ReflectAsString(Boolean)
    Lucene.Net.Util.AttributeSource.ReflectWith(Lucene.Net.Util.IAttributeReflector)
    Lucene.Net.Util.AttributeSource.CloneAttributes()
    Lucene.Net.Util.AttributeSource.CopyTo(Lucene.Net.Util.AttributeSource)
    Lucene.Net.Util.AttributeSource.ToString()
    System.Object.Equals(System.Object, System.Object)
    System.Object.GetType()
    System.Object.MemberwiseClone()
    System.Object.ReferenceEquals(System.Object, System.Object)
    Namespace: Lucene.Net.Analysis.Pattern
    Assembly: Lucene.Net.Analysis.Common.dll
    Syntax
    public sealed class PatternTokenizer : Tokenizer, IDisposable

    Constructors

    | Improve this Doc View Source

    PatternTokenizer(AttributeSource.AttributeFactory, TextReader, Regex, Int32)

    creates a new PatternTokenizer returning tokens from group (-1 for split functionality)

    Declaration
    public PatternTokenizer(AttributeSource.AttributeFactory factory, TextReader input, Regex pattern, int group)
    Parameters
    Type Name Description
    Lucene.Net.Util.AttributeSource.AttributeFactory factory
    System.IO.TextReader input
    System.Text.RegularExpressions.Regex pattern
    System.Int32 group
    | Improve this Doc View Source

    PatternTokenizer(TextReader, Regex, Int32)

    creates a new PatternTokenizer returning tokens from group (-1 for split functionality)

    Declaration
    public PatternTokenizer(TextReader input, Regex pattern, int group)
    Parameters
    Type Name Description
    System.IO.TextReader input
    System.Text.RegularExpressions.Regex pattern
    System.Int32 group

    Methods

    | Improve this Doc View Source

    End()

    Declaration
    public override void End()
    Overrides
    Lucene.Net.Analysis.TokenStream.End()
    | Improve this Doc View Source

    IncrementToken()

    Declaration
    public override bool IncrementToken()
    Returns
    Type Description
    System.Boolean
    Overrides
    Lucene.Net.Analysis.TokenStream.IncrementToken()
    | Improve this Doc View Source

    Reset()

    Declaration
    public override void Reset()
    Overrides
    Lucene.Net.Analysis.Tokenizer.Reset()

    Implements

    System.IDisposable

    See Also

    System.Text.RegularExpressions.Regex
    • Improve this Doc
    • View Source
    Back to top Copyright © 2020 The Apache Software Foundation, Licensed under the Apache License, Version 2.0
    Apache Lucene.Net, Lucene.Net, Apache, the Apache feather logo, and the Apache Lucene.Net project logo are trademarks of The Apache Software Foundation.
    All other marks mentioned may be trademarks or registered trademarks of their respective owners.