Class PatternTokenizerFactory

Factory for PatternTokenizer. This tokenizer uses regex pattern matching to construct distinct tokens for the input stream. It takes two arguments: "pattern" and "group".

"pattern" is the regular expression.
"group" says which group to extract into tokens.

group=-1 (the default) is equivalent to "split". In this case, the tokens will be equivalent to the output from (without empty tokens): Replace(string, string)

Using group >= 0 selects the matching group as the token. For example, if you have:

pattern = \'([^\']+)\'
                                                                                             group = 0
                                                                                             input = aaa 'bbb' 'ccc'

the output will be two tokens: 'bbb' and 'ccc' (including the ' marks). With the same input but using group=1, the output would be: bbb and ccc (no ' marks)

NOTE: This Tokenizer does not output tokens that are of zero length.

<fieldType name="text_ptn" class="solr.TextField" positionIncrementGap="100">
                                                                                 <analyzer>
                                                                                   <tokenizer class="solr.PatternTokenizerFactory" pattern="\'([^\']+)\'" group="1"/>
                                                                                 </analyzer>
                                                                               </fieldType>

@since solr1.2

Inheritance

object

AbstractAnalysisFactory

TokenizerFactory

PatternTokenizerFactory

Inherited Members

TokenizerFactory.ForName(string, IDictionary<string, string>)

TokenizerFactory.LookupClass(string)

TokenizerFactory.AvailableTokenizers

TokenizerFactory.ReloadTokenizers()

TokenizerFactory.Create(TextReader)

AbstractAnalysisFactory.LUCENE_MATCH_VERSION_PARAM

AbstractAnalysisFactory.m_luceneMatchVersion

AbstractAnalysisFactory.OriginalArgs

AbstractAnalysisFactory.AssureMatchVersion()

AbstractAnalysisFactory.LuceneMatchVersion

AbstractAnalysisFactory.Require(IDictionary<string, string>, string)

AbstractAnalysisFactory.Require(IDictionary<string, string>, string, ICollection<string>)

AbstractAnalysisFactory.Require(IDictionary<string, string>, string, ICollection<string>, bool)

AbstractAnalysisFactory.Get(IDictionary<string, string>, string, string)

AbstractAnalysisFactory.Get(IDictionary<string, string>, string, ICollection<string>)

AbstractAnalysisFactory.Get(IDictionary<string, string>, string, ICollection<string>, string)

AbstractAnalysisFactory.Get(IDictionary<string, string>, string, ICollection<string>, string, bool)

AbstractAnalysisFactory.RequireInt32(IDictionary<string, string>, string)

AbstractAnalysisFactory.GetInt32(IDictionary<string, string>, string, int)

AbstractAnalysisFactory.RequireBoolean(IDictionary<string, string>, string)

AbstractAnalysisFactory.GetBoolean(IDictionary<string, string>, string, bool)

AbstractAnalysisFactory.RequireSingle(IDictionary<string, string>, string)

AbstractAnalysisFactory.GetSingle(IDictionary<string, string>, string, float)

AbstractAnalysisFactory.RequireChar(IDictionary<string, string>, string)

AbstractAnalysisFactory.GetChar(IDictionary<string, string>, string, char)

AbstractAnalysisFactory.GetSet(IDictionary<string, string>, string)

AbstractAnalysisFactory.GetPattern(IDictionary<string, string>, string)

AbstractAnalysisFactory.GetCulture(IDictionary<string, string>, string, CultureInfo)

AbstractAnalysisFactory.GetWordSet(IResourceLoader, string, bool)

AbstractAnalysisFactory.GetLines(IResourceLoader, string)

AbstractAnalysisFactory.GetSnowballWordSet(IResourceLoader, string, bool)

AbstractAnalysisFactory.SplitFileNames(string)

AbstractAnalysisFactory.GetClassArg()

AbstractAnalysisFactory.IsExplicitLuceneMatchVersion

object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Namespace: Lucene.Net.Analysis.Pattern

Assembly: Lucene.Net.Analysis.Common.dll

Syntax

public class PatternTokenizerFactory : TokenizerFactory

Constructors

PatternTokenizerFactory(IDictionary<string, string>)

Creates a new PatternTokenizerFactory

Declaration

public PatternTokenizerFactory(IDictionary<string, string> args)

Parameters

Type	Name	Description
IDictionary<string, string>	args

Fields

GROUP

Factory for PatternTokenizer. This tokenizer uses regex pattern matching to construct distinct tokens for the input stream. It takes two arguments: "pattern" and "group".

"pattern" is the regular expression.
"group" says which group to extract into tokens.

group=-1 (the default) is equivalent to "split". In this case, the tokens will be equivalent to the output from (without empty tokens): Replace(string, string)

Using group >= 0 selects the matching group as the token. For example, if you have:

pattern = \'([^\']+)\'
                                                                                             group = 0
                                                                                             input = aaa 'bbb' 'ccc'

the output will be two tokens: 'bbb' and 'ccc' (including the ' marks). With the same input but using group=1, the output would be: bbb and ccc (no ' marks)

NOTE: This Tokenizer does not output tokens that are of zero length.

<fieldType name="text_ptn" class="solr.TextField" positionIncrementGap="100">
                                                                                 <analyzer>
                                                                                   <tokenizer class="solr.PatternTokenizerFactory" pattern="\'([^\']+)\'" group="1"/>
                                                                                 </analyzer>
                                                                               </fieldType>

@since solr1.2

Declaration

public const string GROUP = "group"

Field Value

Type	Description
string

PATTERN

Factory for PatternTokenizer. This tokenizer uses regex pattern matching to construct distinct tokens for the input stream. It takes two arguments: "pattern" and "group".

"pattern" is the regular expression.
"group" says which group to extract into tokens.

group=-1 (the default) is equivalent to "split". In this case, the tokens will be equivalent to the output from (without empty tokens): Replace(string, string)

Using group >= 0 selects the matching group as the token. For example, if you have:

pattern = \'([^\']+)\'
                                                                                             group = 0
                                                                                             input = aaa 'bbb' 'ccc'

the output will be two tokens: 'bbb' and 'ccc' (including the ' marks). With the same input but using group=1, the output would be: bbb and ccc (no ' marks)

NOTE: This Tokenizer does not output tokens that are of zero length.

<fieldType name="text_ptn" class="solr.TextField" positionIncrementGap="100">
                                                                                 <analyzer>
                                                                                   <tokenizer class="solr.PatternTokenizerFactory" pattern="\'([^\']+)\'" group="1"/>
                                                                                 </analyzer>
                                                                               </fieldType>

@since solr1.2

Declaration

public const string PATTERN = "pattern"

Field Value

Type	Description
string

m_group

Factory for PatternTokenizer. This tokenizer uses regex pattern matching to construct distinct tokens for the input stream. It takes two arguments: "pattern" and "group".

"pattern" is the regular expression.
"group" says which group to extract into tokens.

group=-1 (the default) is equivalent to "split". In this case, the tokens will be equivalent to the output from (without empty tokens): Replace(string, string)

Using group >= 0 selects the matching group as the token. For example, if you have:

pattern = \'([^\']+)\'
                                                                                             group = 0
                                                                                             input = aaa 'bbb' 'ccc'

the output will be two tokens: 'bbb' and 'ccc' (including the ' marks). With the same input but using group=1, the output would be: bbb and ccc (no ' marks)

NOTE: This Tokenizer does not output tokens that are of zero length.

<fieldType name="text_ptn" class="solr.TextField" positionIncrementGap="100">
                                                                                 <analyzer>
                                                                                   <tokenizer class="solr.PatternTokenizerFactory" pattern="\'([^\']+)\'" group="1"/>
                                                                                 </analyzer>
                                                                               </fieldType>

@since solr1.2

Declaration

protected readonly int m_group

Field Value

Type	Description
int

m_pattern

Factory for PatternTokenizer. This tokenizer uses regex pattern matching to construct distinct tokens for the input stream. It takes two arguments: "pattern" and "group".

"pattern" is the regular expression.
"group" says which group to extract into tokens.

group=-1 (the default) is equivalent to "split". In this case, the tokens will be equivalent to the output from (without empty tokens): Replace(string, string)

Using group >= 0 selects the matching group as the token. For example, if you have:

pattern = \'([^\']+)\'
                                                                                             group = 0
                                                                                             input = aaa 'bbb' 'ccc'

the output will be two tokens: 'bbb' and 'ccc' (including the ' marks). With the same input but using group=1, the output would be: bbb and ccc (no ' marks)

NOTE: This Tokenizer does not output tokens that are of zero length.

<fieldType name="text_ptn" class="solr.TextField" positionIncrementGap="100">
                                                                                 <analyzer>
                                                                                   <tokenizer class="solr.PatternTokenizerFactory" pattern="\'([^\']+)\'" group="1"/>
                                                                                 </analyzer>
                                                                               </fieldType>

@since solr1.2

Declaration

protected readonly Regex m_pattern

Field Value

Type	Description
Regex

Methods

Create(AttributeFactory, TextReader)

Split the input using configured pattern

Declaration

public override Tokenizer Create(AttributeSource.AttributeFactory factory, TextReader input)

Parameters

Type	Name	Description
AttributeSource.AttributeFactory	factory
TextReader	input

Returns

Type	Description
Tokenizer

Overrides

TokenizerFactory.Create(AttributeSource.AttributeFactory, TextReader)

Class PatternTokenizerFactory

Inheritance

Inherited Members

Namespace: Lucene.Net.Analysis.Pattern

Assembly: Lucene.Net.Analysis.Common.dll

Syntax

Constructors

PatternTokenizerFactory(IDictionary<string, string>)

Declaration

Parameters

See Also

Fields

GROUP

Declaration

Field Value

See Also

PATTERN

Declaration

Field Value

See Also

m_group

Declaration

Field Value

See Also

m_pattern

Declaration

Field Value

See Also

Methods

Create(AttributeFactory, TextReader)

Declaration

Parameters

Returns

Overrides

See Also

See Also