Class DictionaryCompoundWordTokenFilter

A Lucene.Net.Analysis.TokenFilter that decomposes compound words found in many Germanic languages.

"Donaudampfschiff" becomes Donau, dampf, schiff so that you can find "Donaudampfschiff" even when you only enter "schiff". It uses a brute-force algorithm to achieve this.

You must specify the required Lucene.Net.Util.LuceneVersion compatibility when creating CompoundWordTokenFilterBase:

As of 3.1, CompoundWordTokenFilterBase correctly handles Unicode 4.0 supplementary characters in strings and char arrays provided as compound word dictionaries.

Inheritance

object

AttributeSource

TokenStream

TokenFilter

CompoundWordTokenFilterBase

DictionaryCompoundWordTokenFilter

Implements

IDisposable

Inherited Members

CompoundWordTokenFilterBase.DEFAULT_MIN_WORD_SIZE

CompoundWordTokenFilterBase.DEFAULT_MIN_SUBWORD_SIZE

CompoundWordTokenFilterBase.DEFAULT_MAX_SUBWORD_SIZE

CompoundWordTokenFilterBase.m_matchVersion

CompoundWordTokenFilterBase.m_dictionary

CompoundWordTokenFilterBase.m_tokens

CompoundWordTokenFilterBase.m_minWordSize

CompoundWordTokenFilterBase.m_minSubwordSize

CompoundWordTokenFilterBase.m_maxSubwordSize

CompoundWordTokenFilterBase.m_onlyLongestMatch

CompoundWordTokenFilterBase.m_termAtt

CompoundWordTokenFilterBase.m_offsetAtt

CompoundWordTokenFilterBase.IncrementToken()

CompoundWordTokenFilterBase.Reset()

TokenFilter.m_input

TokenFilter.End()

TokenFilter.Dispose(bool)

TokenStream.Dispose()

AttributeSource.GetAttributeFactory()

AttributeSource.GetAttributeClassesEnumerator()

AttributeSource.GetAttributeImplsEnumerator()

AttributeSource.AddAttributeImpl(Attribute)

AttributeSource.AddAttribute<T>()

AttributeSource.HasAttributes

AttributeSource.HasAttribute<T>()

AttributeSource.GetAttribute<T>()

AttributeSource.ClearAttributes()

AttributeSource.CaptureState()

AttributeSource.RestoreState(AttributeSource.State)

AttributeSource.GetHashCode()

AttributeSource.Equals(object)

AttributeSource.ReflectAsString(bool)

AttributeSource.ReflectWith(IAttributeReflector)

AttributeSource.CloneAttributes()

AttributeSource.CopyTo(AttributeSource)

AttributeSource.ToString()

object.Equals(object, object)

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

Namespace: Lucene.Net.Analysis.Compound

Assembly: Lucene.Net.Analysis.Common.dll

Syntax

public class DictionaryCompoundWordTokenFilter : CompoundWordTokenFilterBase, IDisposable

Constructors

DictionaryCompoundWordTokenFilter(LuceneVersion, TokenStream, CharArraySet)

Creates a new DictionaryCompoundWordTokenFilter

Declaration

public DictionaryCompoundWordTokenFilter(LuceneVersion matchVersion, TokenStream input, CharArraySet dictionary)

Parameters

Type	Name	Description
LuceneVersion	matchVersion	Lucene version to enable correct Unicode 4.0 behavior in the dictionaries if Version > 3.0. See CompoundWordTokenFilterBase for details.
TokenStream	input	the Lucene.Net.Analysis.TokenStream to process
CharArraySet	dictionary	the word dictionary to match against.

DictionaryCompoundWordTokenFilter(LuceneVersion, TokenStream, CharArraySet, int, int, int, bool)

Creates a new DictionaryCompoundWordTokenFilter

Declaration

public DictionaryCompoundWordTokenFilter(LuceneVersion matchVersion, TokenStream input, CharArraySet dictionary, int minWordSize, int minSubwordSize, int maxSubwordSize, bool onlyLongestMatch)

Parameters

Type	Name	Description
LuceneVersion	matchVersion	Lucene version to enable correct Unicode 4.0 behavior in the dictionaries if Version > 3.0. See CompoundWordTokenFilterBase for details.
TokenStream	input	the Lucene.Net.Analysis.TokenStream to process
CharArraySet	dictionary	the word dictionary to match against.
int	minWordSize	only words longer than this get processed
int	minSubwordSize	only subwords longer than this get to the output stream
int	maxSubwordSize	only subwords shorter than this get to the output stream
bool	onlyLongestMatch	Add only the longest matching subword to the stream

Methods

Decompose()

Decomposes the current m_termAtt and places CompoundWordTokenFilterBase.CompoundToken instances in the m_tokens list. The original token may not be placed in the list, as it is automatically passed through this filter.

Declaration

protected override void Decompose()

Overrides

CompoundWordTokenFilterBase.Decompose()

Implements

IDisposable