Class PatternTokenizerFactory
Factory for PatternTokenizer. This tokenizer uses regex pattern matching to construct distinct tokens for the input stream. It takes two arguments: "pattern" and "group".
- "pattern" is the regular expression.
- "group" says which group to extract into tokens.
group=-1 (the default) is equivalent to "split". In this case, the tokens will be equivalent to the output from (without empty tokens): System.Text.RegularExpressions.Regex.Replace(System.String,System.String)
Using group >= 0 selects the matching group as the token. For example, if you have:
pattern = \'([^\']+)\'
group = 0
input = aaa 'bbb' 'ccc'
the output will be two tokens: 'bbb' and 'ccc' (including the ' marks). With the same input but using group=1, the output would be: bbb and ccc (no ' marks)
NOTE: This Tokenizer does not output tokens that are of zero length.
<fieldType name="text_ptn" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.PatternTokenizerFactory" pattern="\'([^\']+)\'" group="1"/>
</analyzer>
</fieldType>
@since solr1.2
Inherited Members
Namespace: Lucene.Net.Analysis.Pattern
Assembly: Lucene.Net.Analysis.Common.dll
Syntax
public class PatternTokenizerFactory : TokenizerFactory
Constructors
| Improve this Doc View SourcePatternTokenizerFactory(IDictionary<String, String>)
Creates a new PatternTokenizerFactory
Declaration
public PatternTokenizerFactory(IDictionary<string, string> args)
Parameters
Type | Name | Description |
---|---|---|
System.Collections.Generic.IDictionary<System.String, System.String> | args |
Fields
| Improve this Doc View SourceGROUP
Declaration
public const string GROUP = "group"
Field Value
Type | Description |
---|---|
System.String |
m_group
Declaration
protected readonly int m_group
Field Value
Type | Description |
---|---|
System.Int32 |
m_pattern
Declaration
protected readonly Regex m_pattern
Field Value
Type | Description |
---|---|
System.Text.RegularExpressions.Regex |
PATTERN
Declaration
public const string PATTERN = "pattern"
Field Value
Type | Description |
---|---|
System.String |
Methods
| Improve this Doc View SourceCreate(AttributeSource.AttributeFactory, TextReader)
Split the input using configured pattern
Declaration
public override Tokenizer Create(AttributeSource.AttributeFactory factory, TextReader input)
Parameters
Type | Name | Description |
---|---|---|
Lucene.Net.Util.AttributeSource.AttributeFactory | factory | |
System.IO.TextReader | input |
Returns
Type | Description |
---|---|
Lucene.Net.Analysis.Tokenizer |