Fork me on GitHub
  • API

    Show / Hide Table of Contents

    Class PatternTokenizerFactory

    Factory for PatternTokenizer. This tokenizer uses regex pattern matching to construct distinct tokens for the input stream. It takes two arguments: "pattern" and "group".

    • "pattern" is the regular expression.
    • "group" says which group to extract into tokens.

    group=-1 (the default) is equivalent to "split". In this case, the tokens will be equivalent to the output from (without empty tokens): Replace(string, string)

    Using group >= 0 selects the matching group as the token. For example, if you have:

    pattern = \'([^\']+)\'
                                                                                                 group = 0
                                                                                                 input = aaa 'bbb' 'ccc'
    the output will be two tokens: 'bbb' and 'ccc' (including the ' marks). With the same input but using group=1, the output would be: bbb and ccc (no ' marks)

    NOTE: This Tokenizer does not output tokens that are of zero length.

    <fieldType name="text_ptn" class="solr.TextField" positionIncrementGap="100">
                                                                                     <analyzer>
                                                                                       <tokenizer class="solr.PatternTokenizerFactory" pattern="\'([^\']+)\'" group="1"/>
                                                                                     </analyzer>
                                                                                   </fieldType>

    @since solr1.2

    Inheritance
    object
    AbstractAnalysisFactory
    TokenizerFactory
    PatternTokenizerFactory
    Inherited Members
    TokenizerFactory.ForName(string, IDictionary<string, string>)
    TokenizerFactory.LookupClass(string)
    TokenizerFactory.AvailableTokenizers
    TokenizerFactory.ReloadTokenizers()
    TokenizerFactory.Create(TextReader)
    AbstractAnalysisFactory.LUCENE_MATCH_VERSION_PARAM
    AbstractAnalysisFactory.m_luceneMatchVersion
    AbstractAnalysisFactory.OriginalArgs
    AbstractAnalysisFactory.AssureMatchVersion()
    AbstractAnalysisFactory.LuceneMatchVersion
    AbstractAnalysisFactory.Require(IDictionary<string, string>, string)
    AbstractAnalysisFactory.Require(IDictionary<string, string>, string, ICollection<string>)
    AbstractAnalysisFactory.Require(IDictionary<string, string>, string, ICollection<string>, bool)
    AbstractAnalysisFactory.Get(IDictionary<string, string>, string, string)
    AbstractAnalysisFactory.Get(IDictionary<string, string>, string, ICollection<string>)
    AbstractAnalysisFactory.Get(IDictionary<string, string>, string, ICollection<string>, string)
    AbstractAnalysisFactory.Get(IDictionary<string, string>, string, ICollection<string>, string, bool)
    AbstractAnalysisFactory.RequireInt32(IDictionary<string, string>, string)
    AbstractAnalysisFactory.GetInt32(IDictionary<string, string>, string, int)
    AbstractAnalysisFactory.RequireBoolean(IDictionary<string, string>, string)
    AbstractAnalysisFactory.GetBoolean(IDictionary<string, string>, string, bool)
    AbstractAnalysisFactory.RequireSingle(IDictionary<string, string>, string)
    AbstractAnalysisFactory.GetSingle(IDictionary<string, string>, string, float)
    AbstractAnalysisFactory.RequireChar(IDictionary<string, string>, string)
    AbstractAnalysisFactory.GetChar(IDictionary<string, string>, string, char)
    AbstractAnalysisFactory.GetSet(IDictionary<string, string>, string)
    AbstractAnalysisFactory.GetPattern(IDictionary<string, string>, string)
    AbstractAnalysisFactory.GetCulture(IDictionary<string, string>, string, CultureInfo)
    AbstractAnalysisFactory.GetWordSet(IResourceLoader, string, bool)
    AbstractAnalysisFactory.GetLines(IResourceLoader, string)
    AbstractAnalysisFactory.GetSnowballWordSet(IResourceLoader, string, bool)
    AbstractAnalysisFactory.SplitFileNames(string)
    AbstractAnalysisFactory.GetClassArg()
    AbstractAnalysisFactory.IsExplicitLuceneMatchVersion
    object.Equals(object)
    object.Equals(object, object)
    object.GetHashCode()
    object.GetType()
    object.MemberwiseClone()
    object.ReferenceEquals(object, object)
    object.ToString()
    Namespace: Lucene.Net.Analysis.Pattern
    Assembly: Lucene.Net.Analysis.Common.dll
    Syntax
    public class PatternTokenizerFactory : TokenizerFactory

    Constructors

    PatternTokenizerFactory(IDictionary<string, string>)

    Creates a new PatternTokenizerFactory

    Declaration
    public PatternTokenizerFactory(IDictionary<string, string> args)
    Parameters
    Type Name Description
    IDictionary<string, string> args
    See Also
    PatternTokenizer

    Fields

    GROUP

    Factory for PatternTokenizer. This tokenizer uses regex pattern matching to construct distinct tokens for the input stream. It takes two arguments: "pattern" and "group".

    • "pattern" is the regular expression.
    • "group" says which group to extract into tokens.

    group=-1 (the default) is equivalent to "split". In this case, the tokens will be equivalent to the output from (without empty tokens): Replace(string, string)

    Using group >= 0 selects the matching group as the token. For example, if you have:

    pattern = \'([^\']+)\'
                                                                                                 group = 0
                                                                                                 input = aaa 'bbb' 'ccc'
    the output will be two tokens: 'bbb' and 'ccc' (including the ' marks). With the same input but using group=1, the output would be: bbb and ccc (no ' marks)

    NOTE: This Tokenizer does not output tokens that are of zero length.

    <fieldType name="text_ptn" class="solr.TextField" positionIncrementGap="100">
                                                                                     <analyzer>
                                                                                       <tokenizer class="solr.PatternTokenizerFactory" pattern="\'([^\']+)\'" group="1"/>
                                                                                     </analyzer>
                                                                                   </fieldType>

    @since solr1.2

    Declaration
    public const string GROUP = "group"
    Field Value
    Type Description
    string
    See Also
    PatternTokenizer

    PATTERN

    Factory for PatternTokenizer. This tokenizer uses regex pattern matching to construct distinct tokens for the input stream. It takes two arguments: "pattern" and "group".

    • "pattern" is the regular expression.
    • "group" says which group to extract into tokens.

    group=-1 (the default) is equivalent to "split". In this case, the tokens will be equivalent to the output from (without empty tokens): Replace(string, string)

    Using group >= 0 selects the matching group as the token. For example, if you have:

    pattern = \'([^\']+)\'
                                                                                                 group = 0
                                                                                                 input = aaa 'bbb' 'ccc'
    the output will be two tokens: 'bbb' and 'ccc' (including the ' marks). With the same input but using group=1, the output would be: bbb and ccc (no ' marks)

    NOTE: This Tokenizer does not output tokens that are of zero length.

    <fieldType name="text_ptn" class="solr.TextField" positionIncrementGap="100">
                                                                                     <analyzer>
                                                                                       <tokenizer class="solr.PatternTokenizerFactory" pattern="\'([^\']+)\'" group="1"/>
                                                                                     </analyzer>
                                                                                   </fieldType>

    @since solr1.2

    Declaration
    public const string PATTERN = "pattern"
    Field Value
    Type Description
    string
    See Also
    PatternTokenizer

    m_group

    Factory for PatternTokenizer. This tokenizer uses regex pattern matching to construct distinct tokens for the input stream. It takes two arguments: "pattern" and "group".

    • "pattern" is the regular expression.
    • "group" says which group to extract into tokens.

    group=-1 (the default) is equivalent to "split". In this case, the tokens will be equivalent to the output from (without empty tokens): Replace(string, string)

    Using group >= 0 selects the matching group as the token. For example, if you have:

    pattern = \'([^\']+)\'
                                                                                                 group = 0
                                                                                                 input = aaa 'bbb' 'ccc'
    the output will be two tokens: 'bbb' and 'ccc' (including the ' marks). With the same input but using group=1, the output would be: bbb and ccc (no ' marks)

    NOTE: This Tokenizer does not output tokens that are of zero length.

    <fieldType name="text_ptn" class="solr.TextField" positionIncrementGap="100">
                                                                                     <analyzer>
                                                                                       <tokenizer class="solr.PatternTokenizerFactory" pattern="\'([^\']+)\'" group="1"/>
                                                                                     </analyzer>
                                                                                   </fieldType>

    @since solr1.2

    Declaration
    protected readonly int m_group
    Field Value
    Type Description
    int
    See Also
    PatternTokenizer

    m_pattern

    Factory for PatternTokenizer. This tokenizer uses regex pattern matching to construct distinct tokens for the input stream. It takes two arguments: "pattern" and "group".

    • "pattern" is the regular expression.
    • "group" says which group to extract into tokens.

    group=-1 (the default) is equivalent to "split". In this case, the tokens will be equivalent to the output from (without empty tokens): Replace(string, string)

    Using group >= 0 selects the matching group as the token. For example, if you have:

    pattern = \'([^\']+)\'
                                                                                                 group = 0
                                                                                                 input = aaa 'bbb' 'ccc'
    the output will be two tokens: 'bbb' and 'ccc' (including the ' marks). With the same input but using group=1, the output would be: bbb and ccc (no ' marks)

    NOTE: This Tokenizer does not output tokens that are of zero length.

    <fieldType name="text_ptn" class="solr.TextField" positionIncrementGap="100">
                                                                                     <analyzer>
                                                                                       <tokenizer class="solr.PatternTokenizerFactory" pattern="\'([^\']+)\'" group="1"/>
                                                                                     </analyzer>
                                                                                   </fieldType>

    @since solr1.2

    Declaration
    protected readonly Regex m_pattern
    Field Value
    Type Description
    Regex
    See Also
    PatternTokenizer

    Methods

    Create(AttributeFactory, TextReader)

    Split the input using configured pattern

    Declaration
    public override Tokenizer Create(AttributeSource.AttributeFactory factory, TextReader input)
    Parameters
    Type Name Description
    AttributeSource.AttributeFactory factory
    TextReader input
    Returns
    Type Description
    Tokenizer
    Overrides
    TokenizerFactory.Create(AttributeSource.AttributeFactory, TextReader)
    See Also
    PatternTokenizer

    See Also

    PatternTokenizer
    Back to top Copyright © 2024 The Apache Software Foundation, Licensed under the Apache License, Version 2.0
    Apache Lucene.Net, Lucene.Net, Apache, the Apache feather logo, and the Apache Lucene.Net project logo are trademarks of The Apache Software Foundation.
    All other marks mentioned may be trademarks or registered trademarks of their respective owners.