Fork me on GitHub
  • API

    Show / Hide Table of Contents

    Class PatternTokenizerFactory

    Factory for PatternTokenizer. This tokenizer uses regex pattern matching to construct distinct tokens for the input stream. It takes two arguments: "pattern" and "group".

    • "pattern" is the regular expression.
    • "group" says which group to extract into tokens.

    group=-1 (the default) is equivalent to "split". In this case, the tokens will be equivalent to the output from (without empty tokens): System.Text.RegularExpressions.Regex.Replace(System.String,System.String)

    Using group >= 0 selects the matching group as the token. For example, if you have:

        pattern = \'([^\']+)\'
        group = 0
        input = aaa 'bbb' 'ccc'

    the output will be two tokens: 'bbb' and 'ccc' (including the ' marks). With the same input but using group=1, the output would be: bbb and ccc (no ' marks)

    NOTE: This Tokenizer does not output tokens that are of zero length.

    <fieldType name="text_ptn" class="solr.TextField" positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.PatternTokenizerFactory" pattern="\'([^\']+)\'" group="1"/>
      </analyzer>
    </fieldType>

    @since solr1.2

    Inheritance
    System.Object
    AbstractAnalysisFactory
    TokenizerFactory
    PatternTokenizerFactory
    Inherited Members
    TokenizerFactory.ForName(String, IDictionary<String, String>)
    TokenizerFactory.LookupClass(String)
    TokenizerFactory.AvailableTokenizers
    TokenizerFactory.ReloadTokenizers()
    TokenizerFactory.Create(TextReader)
    AbstractAnalysisFactory.LUCENE_MATCH_VERSION_PARAM
    AbstractAnalysisFactory.m_luceneMatchVersion
    AbstractAnalysisFactory.OriginalArgs
    AbstractAnalysisFactory.AssureMatchVersion()
    AbstractAnalysisFactory.LuceneMatchVersion
    AbstractAnalysisFactory.Require(IDictionary<String, String>, String)
    AbstractAnalysisFactory.Require(IDictionary<String, String>, String, ICollection<String>)
    AbstractAnalysisFactory.Require(IDictionary<String, String>, String, ICollection<String>, Boolean)
    AbstractAnalysisFactory.Get(IDictionary<String, String>, String, String)
    AbstractAnalysisFactory.Get(IDictionary<String, String>, String, ICollection<String>)
    AbstractAnalysisFactory.Get(IDictionary<String, String>, String, ICollection<String>, String)
    AbstractAnalysisFactory.Get(IDictionary<String, String>, String, ICollection<String>, String, Boolean)
    AbstractAnalysisFactory.RequireInt32(IDictionary<String, String>, String)
    AbstractAnalysisFactory.GetInt32(IDictionary<String, String>, String, Int32)
    AbstractAnalysisFactory.RequireBoolean(IDictionary<String, String>, String)
    AbstractAnalysisFactory.GetBoolean(IDictionary<String, String>, String, Boolean)
    AbstractAnalysisFactory.RequireSingle(IDictionary<String, String>, String)
    AbstractAnalysisFactory.GetSingle(IDictionary<String, String>, String, Single)
    AbstractAnalysisFactory.RequireChar(IDictionary<String, String>, String)
    AbstractAnalysisFactory.GetChar(IDictionary<String, String>, String, Char)
    AbstractAnalysisFactory.GetSet(IDictionary<String, String>, String)
    AbstractAnalysisFactory.GetPattern(IDictionary<String, String>, String)
    AbstractAnalysisFactory.GetCulture(IDictionary<String, String>, String, CultureInfo)
    AbstractAnalysisFactory.GetWordSet(IResourceLoader, String, Boolean)
    AbstractAnalysisFactory.GetLines(IResourceLoader, String)
    AbstractAnalysisFactory.GetSnowballWordSet(IResourceLoader, String, Boolean)
    AbstractAnalysisFactory.SplitFileNames(String)
    AbstractAnalysisFactory.GetClassArg()
    AbstractAnalysisFactory.IsExplicitLuceneMatchVersion
    System.Object.Equals(System.Object)
    System.Object.Equals(System.Object, System.Object)
    System.Object.GetHashCode()
    System.Object.GetType()
    System.Object.MemberwiseClone()
    System.Object.ReferenceEquals(System.Object, System.Object)
    System.Object.ToString()
    Namespace: Lucene.Net.Analysis.Pattern
    Assembly: Lucene.Net.Analysis.Common.dll
    Syntax
    public class PatternTokenizerFactory : TokenizerFactory

    Constructors

    | Improve this Doc View Source

    PatternTokenizerFactory(IDictionary<String, String>)

    Creates a new PatternTokenizerFactory

    Declaration
    public PatternTokenizerFactory(IDictionary<string, string> args)
    Parameters
    Type Name Description
    System.Collections.Generic.IDictionary<System.String, System.String> args

    Fields

    | Improve this Doc View Source

    GROUP

    Declaration
    public const string GROUP = "group"
    Field Value
    Type Description
    System.String
    | Improve this Doc View Source

    m_group

    Declaration
    protected readonly int m_group
    Field Value
    Type Description
    System.Int32
    | Improve this Doc View Source

    m_pattern

    Declaration
    protected readonly Regex m_pattern
    Field Value
    Type Description
    System.Text.RegularExpressions.Regex
    | Improve this Doc View Source

    PATTERN

    Declaration
    public const string PATTERN = "pattern"
    Field Value
    Type Description
    System.String

    Methods

    | Improve this Doc View Source

    Create(AttributeSource.AttributeFactory, TextReader)

    Split the input using configured pattern

    Declaration
    public override Tokenizer Create(AttributeSource.AttributeFactory factory, TextReader input)
    Parameters
    Type Name Description
    Lucene.Net.Util.AttributeSource.AttributeFactory factory
    System.IO.TextReader input
    Returns
    Type Description
    Lucene.Net.Analysis.Tokenizer
    Overrides
    TokenizerFactory.Create(AttributeSource.AttributeFactory, TextReader)

    See Also

    PatternTokenizer
    • Improve this Doc
    • View Source
    Back to top Copyright © 2020 The Apache Software Foundation, Licensed under the Apache License, Version 2.0
    Apache Lucene.Net, Lucene.Net, Apache, the Apache feather logo, and the Apache Lucene.Net project logo are trademarks of The Apache Software Foundation.
    All other marks mentioned may be trademarks or registered trademarks of their respective owners.