Class PatternTokenizerFactory
Factory for PatternTokenizer. This tokenizer uses regex pattern matching to construct distinct tokens for the input stream. It takes two arguments: "pattern" and "group".
- "pattern" is the regular expression.
- "group" says which group to extract into tokens.
group=-1 (the default) is equivalent to "split". In this case, the tokens will be equivalent to the output from (without empty tokens): System.Text.RegularExpressions.Regex.Replace(System.String,System.String)
Using group >= 0 selects the matching group as the token.  For example, if you have:
    pattern = \'([^\']+)\'
    group = 0
    input = aaa 'bbb' 'ccc'the output will be two tokens: 'bbb' and 'ccc' (including the ' marks). With the same input but using group=1, the output would be: bbb and ccc (no ' marks)
NOTE: This Tokenizer does not output tokens that are of zero length.
<fieldType name="text_ptn" class="solr.TextField" positionIncrementGap="100">
  <analyzer>
    <tokenizer class="solr.PatternTokenizerFactory" pattern="\'([^\']+)\'" group="1"/>
  </analyzer>
</fieldType>@since solr1.2
Inherited Members
Namespace: Lucene.Net.Analysis.Pattern
Assembly: Lucene.Net.Analysis.Common.dll
Syntax
public class PatternTokenizerFactory : TokenizerFactoryConstructors
| Improve this Doc View SourcePatternTokenizerFactory(IDictionary<String, String>)
Creates a new PatternTokenizerFactory
Declaration
public PatternTokenizerFactory(IDictionary<string, string> args)Parameters
| Type | Name | Description | 
|---|---|---|
| System.Collections.Generic.IDictionary<System.String, System.String> | args | 
Fields
| Improve this Doc View SourceGROUP
Declaration
public const string GROUP = "group"Field Value
| Type | Description | 
|---|---|
| System.String | 
m_group
Declaration
protected readonly int m_groupField Value
| Type | Description | 
|---|---|
| System.Int32 | 
m_pattern
Declaration
protected readonly Regex m_patternField Value
| Type | Description | 
|---|---|
| System.Text.RegularExpressions.Regex | 
PATTERN
Declaration
public const string PATTERN = "pattern"Field Value
| Type | Description | 
|---|---|
| System.String | 
Methods
| Improve this Doc View SourceCreate(AttributeSource.AttributeFactory, TextReader)
Split the input using configured pattern
Declaration
public override Tokenizer Create(AttributeSource.AttributeFactory factory, TextReader input)Parameters
| Type | Name | Description | 
|---|---|---|
| Lucene.Net.Util.AttributeSource.AttributeFactory | factory | |
| System.IO.TextReader | input | 
Returns
| Type | Description | 
|---|---|
| Lucene.Net.Analysis.Tokenizer |