Class WordDelimiterFilter
Splits words into subwords and performs optional transformations on subword groups. Words are split into subwords with the following rules:
- split on intra-word delimiters (by default, all non alpha-numeric
        characters): 
"Wi-Fi"→"Wi", "Fi" - split on case transitions: 
"PowerShot"→"Power", "Shot" - split on letter-number transitions: 
"SD500"→"SD", "500" - leading and trailing intra-word delimiters on each subword are ignored:
"//hello---there, 'dude'"→"hello", "there", "dude" - trailing "'s" are removed for each subword: 
"O'Neil's"→"O", "Neil"- Note: this step isn't performed in a separate filter because of possible subword combinations.
 
 
The combinations parameter affects how subwords are combined:
- combinations="0" causes no subword combinations: 
→"PowerShot"0:"Power", 1:"Shot"(0 and 1 are the token positions) - combinations="1" means that in addition to the subwords, maximum runs of
        non-numeric subwords are catenated and produced at the same position of the
        last subword in the run:
"PowerShot"→0:"Power", 1:"Shot" 1:"PowerShot""A's+B's&C's"-gt;0:"A", 1:"B", 2:"C", 2:"ABC""Super-Duper-XL500-42-AutoCoder!"→0:"Super", 1:"Duper", 2:"XL", 2:"SuperDuperXL", 3:"500" 4:"42", 5:"Auto", 6:"Coder", 6:"AutoCoder"
 
One use for WordDelimiterFilter is to help match words with different subword delimiters. For example, if the source text contained "wi-fi" one may want "wifi" "WiFi" "wi-fi" "wi+fi" queries to all match. One way of doing so is to specify combinations="1" in the analyzer used for indexing, and combinations="0" (the default) in the analyzer used for querying. Given that the current StandardTokenizer immediately removes many intra-word delimiters, it is recommended that this filter be used after a tokenizer that does not do this (such as WhitespaceTokenizer).
Inheritance
Implements
Inherited Members
Namespace: Lucene.Net.Analysis.Miscellaneous
Assembly: Lucene.Net.Analysis.Common.dll
Syntax
public sealed class WordDelimiterFilter : TokenFilter, IDisposable
  Constructors
| Improve this Doc View SourceWordDelimiterFilter(LuceneVersion, TokenStream, WordDelimiterFlags, CharArraySet)
Creates a new WordDelimiterFilter using DEFAULT_WORD_DELIM_TABLE as its charTypeTable
Declaration
public WordDelimiterFilter(LuceneVersion matchVersion, TokenStream in, WordDelimiterFlags configurationFlags, CharArraySet protWords)
  Parameters
| Type | Name | Description | 
|---|---|---|
| Lucene.Net.Util.LuceneVersion | matchVersion | lucene compatibility version  | 
      
| Lucene.Net.Analysis.TokenStream | in | Lucene.Net.Analysis.TokenStream to be filtered  | 
      
| WordDelimiterFlags | configurationFlags | Flags configuring the filter  | 
      
| CharArraySet | protWords | If not null is the set of tokens to protect from being delimited  | 
      
WordDelimiterFilter(LuceneVersion, TokenStream, Byte[], WordDelimiterFlags, CharArraySet)
Creates a new WordDelimiterFilter
Declaration
public WordDelimiterFilter(LuceneVersion matchVersion, TokenStream in, byte[] charTypeTable, WordDelimiterFlags configurationFlags, CharArraySet protWords)
  Parameters
| Type | Name | Description | 
|---|---|---|
| Lucene.Net.Util.LuceneVersion | matchVersion | lucene compatibility version  | 
      
| Lucene.Net.Analysis.TokenStream | in | TokenStream to be filtered  | 
      
| System.Byte[] | charTypeTable | table containing character types  | 
      
| WordDelimiterFlags | configurationFlags | Flags configuring the filter  | 
      
| CharArraySet | protWords | If not null is the set of tokens to protect from being delimited  | 
      
Fields
| Improve this Doc View SourceALPHA
Declaration
public const int ALPHA = 3
  Field Value
| Type | Description | 
|---|---|
| System.Int32 | 
ALPHANUM
Declaration
public const int ALPHANUM = 7
  Field Value
| Type | Description | 
|---|---|
| System.Int32 | 
DIGIT
Declaration
public const int DIGIT = 4
  Field Value
| Type | Description | 
|---|---|
| System.Int32 | 
LOWER
Declaration
public const int LOWER = 1
  Field Value
| Type | Description | 
|---|---|
| System.Int32 | 
SUBWORD_DELIM
Declaration
public const int SUBWORD_DELIM = 8
  Field Value
| Type | Description | 
|---|---|
| System.Int32 | 
UPPER
Declaration
public const int UPPER = 2
  Field Value
| Type | Description | 
|---|---|
| System.Int32 | 
Methods
| Improve this Doc View SourceIncrementToken()
Declaration
public override bool IncrementToken()
  Returns
| Type | Description | 
|---|---|
| System.Boolean | 
Overrides
Reset()
Declaration
public override void Reset()