Class DictionaryCompoundWordTokenFilter
A Lucene.Net.Analysis.TokenFilter that decomposes compound words found in many Germanic languages.
"Donaudampfschiff" becomes Donau, dampf, schiff so that you can find "Donaudampfschiff" even when you only enter "schiff". It uses a brute-force algorithm to achieve this.
You must specify the required Lucene.Net.Util.LuceneVersion compatibility when creating CompoundWordTokenFilterBase:
- As of 3.1, CompoundWordTokenFilterBase correctly handles Unicode 4.0 supplementary characters in strings and char arrays provided as compound word dictionaries.
Inheritance
Implements
Inherited Members
Namespace: Lucene.Net.Analysis.Compound
Assembly: Lucene.Net.Analysis.Common.dll
Syntax
public class DictionaryCompoundWordTokenFilter : CompoundWordTokenFilterBase, IDisposableConstructors
| Improve this Doc View SourceDictionaryCompoundWordTokenFilter(LuceneVersion, TokenStream, CharArraySet)
Creates a new DictionaryCompoundWordTokenFilter
Declaration
public DictionaryCompoundWordTokenFilter(LuceneVersion matchVersion, TokenStream input, CharArraySet dictionary)Parameters
| Type | Name | Description | 
|---|---|---|
| Lucene.Net.Util.LuceneVersion | matchVersion | Lucene version to enable correct Unicode 4.0 behavior in the dictionaries if Version > 3.0. See CompoundWordTokenFilterBase for details. | 
| Lucene.Net.Analysis.TokenStream | input | the Lucene.Net.Analysis.TokenStream to process | 
| CharArraySet | dictionary | the word dictionary to match against. | 
DictionaryCompoundWordTokenFilter(LuceneVersion, TokenStream, CharArraySet, Int32, Int32, Int32, Boolean)
Creates a new DictionaryCompoundWordTokenFilter
Declaration
public DictionaryCompoundWordTokenFilter(LuceneVersion matchVersion, TokenStream input, CharArraySet dictionary, int minWordSize, int minSubwordSize, int maxSubwordSize, bool onlyLongestMatch)Parameters
| Type | Name | Description | 
|---|---|---|
| Lucene.Net.Util.LuceneVersion | matchVersion | Lucene version to enable correct Unicode 4.0 behavior in the dictionaries if Version > 3.0. See CompoundWordTokenFilterBase for details. | 
| Lucene.Net.Analysis.TokenStream | input | the Lucene.Net.Analysis.TokenStream to process | 
| CharArraySet | dictionary | the word dictionary to match against. | 
| System.Int32 | minWordSize | only words longer than this get processed | 
| System.Int32 | minSubwordSize | only subwords longer than this get to the output stream | 
| System.Int32 | maxSubwordSize | only subwords shorter than this get to the output stream | 
| System.Boolean | onlyLongestMatch | Add only the longest matching subword to the stream | 
Methods
| Improve this Doc View SourceDecompose()
Declaration
protected override void Decompose()