Class DictionaryCompoundWordTokenFilter

A Lucene.Net.Analysis.TokenFilter that decomposes compound words found in many Germanic languages.

"Donaudampfschiff" becomes Donau, dampf, schiff so that you can find "Donaudampfschiff" even when you only enter "schiff". It uses a brute-force algorithm to achieve this.

You must specify the required Lucene.Net.Util.LuceneVersion compatibility when creating CompoundWordTokenFilterBase:

As of 3.1, CompoundWordTokenFilterBase correctly handles Unicode 4.0 supplementary characters in strings and char arrays provided as compound word dictionaries.

Inheritance

System.Object

Lucene.Net.Util.AttributeSource

Lucene.Net.Analysis.TokenStream

Lucene.Net.Analysis.TokenFilter

CompoundWordTokenFilterBase

DictionaryCompoundWordTokenFilter

Implements

System.IDisposable

Inherited Members

CompoundWordTokenFilterBase.DEFAULT_MIN_WORD_SIZE

CompoundWordTokenFilterBase.DEFAULT_MIN_SUBWORD_SIZE

CompoundWordTokenFilterBase.DEFAULT_MAX_SUBWORD_SIZE

CompoundWordTokenFilterBase.m_matchVersion

CompoundWordTokenFilterBase.m_dictionary

CompoundWordTokenFilterBase.m_tokens

CompoundWordTokenFilterBase.m_minWordSize

CompoundWordTokenFilterBase.m_minSubwordSize

CompoundWordTokenFilterBase.m_maxSubwordSize

CompoundWordTokenFilterBase.m_onlyLongestMatch

CompoundWordTokenFilterBase.m_termAtt

CompoundWordTokenFilterBase.m_offsetAtt

CompoundWordTokenFilterBase.IncrementToken()

CompoundWordTokenFilterBase.Reset()

Lucene.Net.Analysis.TokenFilter.m_input

Lucene.Net.Analysis.TokenFilter.End()

TokenFilter.Dispose(Boolean)

Lucene.Net.Analysis.TokenStream.Dispose()

Lucene.Net.Util.AttributeSource.GetAttributeFactory()

Lucene.Net.Util.AttributeSource.GetAttributeClassesEnumerator()

Lucene.Net.Util.AttributeSource.GetAttributeImplsEnumerator()

Lucene.Net.Util.AttributeSource.AddAttributeImpl(Lucene.Net.Util.Attribute)

Lucene.Net.Util.AttributeSource.AddAttribute<T>()

Lucene.Net.Util.AttributeSource.HasAttributes

Lucene.Net.Util.AttributeSource.HasAttribute<T>()

Lucene.Net.Util.AttributeSource.GetAttribute<T>()

Lucene.Net.Util.AttributeSource.ClearAttributes()

Lucene.Net.Util.AttributeSource.CaptureState()

Lucene.Net.Util.AttributeSource.RestoreState(Lucene.Net.Util.AttributeSource.State)

Lucene.Net.Util.AttributeSource.GetHashCode()

AttributeSource.Equals(Object)

AttributeSource.ReflectAsString(Boolean)

Lucene.Net.Util.AttributeSource.ReflectWith(Lucene.Net.Util.IAttributeReflector)

Lucene.Net.Util.AttributeSource.CloneAttributes()

Lucene.Net.Util.AttributeSource.CopyTo(Lucene.Net.Util.AttributeSource)

Lucene.Net.Util.AttributeSource.ToString()

System.Object.Equals(System.Object, System.Object)

System.Object.GetType()

System.Object.MemberwiseClone()

System.Object.ReferenceEquals(System.Object, System.Object)

Namespace: Lucene.Net.Analysis.Compound

Assembly: Lucene.Net.Analysis.Common.dll

Syntax

public class DictionaryCompoundWordTokenFilter : CompoundWordTokenFilterBase, IDisposable

Constructors

| Improve this Doc View Source

DictionaryCompoundWordTokenFilter(LuceneVersion, TokenStream, CharArraySet)

Creates a new DictionaryCompoundWordTokenFilter

Declaration

public DictionaryCompoundWordTokenFilter(LuceneVersion matchVersion, TokenStream input, CharArraySet dictionary)

Parameters

Type	Name	Description
Lucene.Net.Util.LuceneVersion	matchVersion	Lucene version to enable correct Unicode 4.0 behavior in the dictionaries if Version > 3.0. See CompoundWordTokenFilterBase for details.
Lucene.Net.Analysis.TokenStream	input	the Lucene.Net.Analysis.TokenStream to process
CharArraySet	dictionary	the word dictionary to match against.

| Improve this Doc View Source

DictionaryCompoundWordTokenFilter(LuceneVersion, TokenStream, CharArraySet, Int32, Int32, Int32, Boolean)

Creates a new DictionaryCompoundWordTokenFilter

Declaration

public DictionaryCompoundWordTokenFilter(LuceneVersion matchVersion, TokenStream input, CharArraySet dictionary, int minWordSize, int minSubwordSize, int maxSubwordSize, bool onlyLongestMatch)

Parameters

Type	Name	Description
Lucene.Net.Util.LuceneVersion	matchVersion	Lucene version to enable correct Unicode 4.0 behavior in the dictionaries if Version > 3.0. See CompoundWordTokenFilterBase for details.
Lucene.Net.Analysis.TokenStream	input	the Lucene.Net.Analysis.TokenStream to process
CharArraySet	dictionary	the word dictionary to match against.
System.Int32	minWordSize	only words longer than this get processed
System.Int32	minSubwordSize	only subwords longer than this get to the output stream
System.Int32	maxSubwordSize	only subwords shorter than this get to the output stream
System.Boolean	onlyLongestMatch	Add only the longest matching subword to the stream

Methods

| Improve this Doc View Source

Decompose()

Declaration

protected override void Decompose()

Overrides

CompoundWordTokenFilterBase.Decompose()

Implements

System.IDisposable