Class WordDelimiterFilter
Splits words into subwords and performs optional transformations on subword groups. Words are split into subwords with the following rules:
- split on intra-word delimiters (by default, all non alpha-numeric
        characters): "Wi-Fi"→"Wi", "Fi"
- split on case transitions: "PowerShot"→"Power", "Shot"
- split on letter-number transitions: "SD500"→"SD", "500"
- leading and trailing intra-word delimiters on each subword are ignored:
"//hello---there, 'dude'"→"hello", "there", "dude"
- trailing "'s" are removed for each subword: "O'Neil's"→"O", "Neil"- Note: this step isn't performed in a separate filter because of possible subword combinations.
 
The combinations parameter affects how subwords are combined:
- combinations="0" causes no subword combinations: 
 →"PowerShot"0:"Power", 1:"Shot"(0 and 1 are the token positions)
- combinations="1" means that in addition to the subwords, maximum runs of
        non-numeric subwords are catenated and produced at the same position of the
        last subword in the run:
- "PowerShot"→- 0:"Power", 1:"Shot" 1:"PowerShot"
- "A's+B's&C's"-gt;- 0:"A", 1:"B", 2:"C", 2:"ABC"
- "Super-Duper-XL500-42-AutoCoder!"→- 0:"Super", 1:"Duper", 2:"XL", 2:"SuperDuperXL", 3:"500" 4:"42", 5:"Auto", 6:"Coder", 6:"AutoCoder"
 
One use for WordDelimiterFilter is to help match words with different subword delimiters. For example, if the source text contained "wi-fi" one may want "wifi" "WiFi" "wi-fi" "wi+fi" queries to all match. One way of doing so is to specify combinations="1" in the analyzer used for indexing, and combinations="0" (the default) in the analyzer used for querying. Given that the current StandardTokenizer immediately removes many intra-word delimiters, it is recommended that this filter be used after a tokenizer that does not do this (such as WhitespaceTokenizer).
Inheritance
Implements
Inherited Members
Namespace: Lucene.Net.Analysis.Miscellaneous
Assembly: Lucene.Net.Analysis.Common.dll
Syntax
public sealed class WordDelimiterFilter : TokenFilter, IDisposableConstructors
| Improve this Doc View SourceWordDelimiterFilter(LuceneVersion, TokenStream, WordDelimiterFlags, CharArraySet)
Creates a new WordDelimiterFilter using DEFAULT_WORD_DELIM_TABLE as its charTypeTable
Declaration
public WordDelimiterFilter(LuceneVersion matchVersion, TokenStream in, WordDelimiterFlags configurationFlags, CharArraySet protWords)Parameters
| Type | Name | Description | 
|---|---|---|
| Lucene.Net.Util.LuceneVersion | matchVersion | lucene compatibility version | 
| Lucene.Net.Analysis.TokenStream | in | Lucene.Net.Analysis.TokenStream to be filtered | 
| WordDelimiterFlags | configurationFlags | Flags configuring the filter | 
| CharArraySet | protWords | If not null is the set of tokens to protect from being delimited | 
WordDelimiterFilter(LuceneVersion, TokenStream, Byte[], WordDelimiterFlags, CharArraySet)
Creates a new WordDelimiterFilter
Declaration
public WordDelimiterFilter(LuceneVersion matchVersion, TokenStream in, byte[] charTypeTable, WordDelimiterFlags configurationFlags, CharArraySet protWords)Parameters
| Type | Name | Description | 
|---|---|---|
| Lucene.Net.Util.LuceneVersion | matchVersion | lucene compatibility version | 
| Lucene.Net.Analysis.TokenStream | in | TokenStream to be filtered | 
| System.Byte[] | charTypeTable | table containing character types | 
| WordDelimiterFlags | configurationFlags | Flags configuring the filter | 
| CharArraySet | protWords | If not null is the set of tokens to protect from being delimited | 
Fields
| Improve this Doc View SourceALPHA
Declaration
public const int ALPHA = 3Field Value
| Type | Description | 
|---|---|
| System.Int32 | 
ALPHANUM
Declaration
public const int ALPHANUM = 7Field Value
| Type | Description | 
|---|---|
| System.Int32 | 
DIGIT
Declaration
public const int DIGIT = 4Field Value
| Type | Description | 
|---|---|
| System.Int32 | 
LOWER
Declaration
public const int LOWER = 1Field Value
| Type | Description | 
|---|---|
| System.Int32 | 
SUBWORD_DELIM
Declaration
public const int SUBWORD_DELIM = 8Field Value
| Type | Description | 
|---|---|
| System.Int32 | 
UPPER
Declaration
public const int UPPER = 2Field Value
| Type | Description | 
|---|---|
| System.Int32 | 
Methods
| Improve this Doc View SourceIncrementToken()
Declaration
public override bool IncrementToken()Returns
| Type | Description | 
|---|---|
| System.Boolean | 
Overrides
Reset()
Declaration
public override void Reset()