Fork me on GitHub
  • API

    Show / Hide Table of Contents

    Class DictionaryCompoundWordTokenFilter

    A Lucene.Net.Analysis.TokenFilter that decomposes compound words found in many Germanic languages.

    "Donaudampfschiff" becomes Donau, dampf, schiff so that you can find "Donaudampfschiff" even when you only enter "schiff". It uses a brute-force algorithm to achieve this.

    You must specify the required Lucene.Net.Util.LuceneVersion compatibility when creating CompoundWordTokenFilterBase:

    • As of 3.1, CompoundWordTokenFilterBase correctly handles Unicode 4.0 supplementary characters in strings and char arrays provided as compound word dictionaries.
    Inheritance
    object
    AttributeSource
    TokenStream
    TokenFilter
    CompoundWordTokenFilterBase
    DictionaryCompoundWordTokenFilter
    Implements
    IDisposable
    Inherited Members
    CompoundWordTokenFilterBase.DEFAULT_MIN_WORD_SIZE
    CompoundWordTokenFilterBase.DEFAULT_MIN_SUBWORD_SIZE
    CompoundWordTokenFilterBase.DEFAULT_MAX_SUBWORD_SIZE
    CompoundWordTokenFilterBase.m_matchVersion
    CompoundWordTokenFilterBase.m_dictionary
    CompoundWordTokenFilterBase.m_tokens
    CompoundWordTokenFilterBase.m_minWordSize
    CompoundWordTokenFilterBase.m_minSubwordSize
    CompoundWordTokenFilterBase.m_maxSubwordSize
    CompoundWordTokenFilterBase.m_onlyLongestMatch
    CompoundWordTokenFilterBase.m_termAtt
    CompoundWordTokenFilterBase.m_offsetAtt
    CompoundWordTokenFilterBase.IncrementToken()
    CompoundWordTokenFilterBase.Reset()
    TokenFilter.m_input
    TokenFilter.End()
    TokenFilter.Dispose(bool)
    TokenStream.Dispose()
    AttributeSource.GetAttributeFactory()
    AttributeSource.GetAttributeClassesEnumerator()
    AttributeSource.GetAttributeImplsEnumerator()
    AttributeSource.AddAttributeImpl(Attribute)
    AttributeSource.AddAttribute<T>()
    AttributeSource.HasAttributes
    AttributeSource.HasAttribute<T>()
    AttributeSource.GetAttribute<T>()
    AttributeSource.ClearAttributes()
    AttributeSource.CaptureState()
    AttributeSource.RestoreState(AttributeSource.State)
    AttributeSource.GetHashCode()
    AttributeSource.Equals(object)
    AttributeSource.ReflectAsString(bool)
    AttributeSource.ReflectWith(IAttributeReflector)
    AttributeSource.CloneAttributes()
    AttributeSource.CopyTo(AttributeSource)
    AttributeSource.ToString()
    object.Equals(object, object)
    object.GetType()
    object.MemberwiseClone()
    object.ReferenceEquals(object, object)
    Namespace: Lucene.Net.Analysis.Compound
    Assembly: Lucene.Net.Analysis.Common.dll
    Syntax
    public class DictionaryCompoundWordTokenFilter : CompoundWordTokenFilterBase, IDisposable

    Constructors

    DictionaryCompoundWordTokenFilter(LuceneVersion, TokenStream, CharArraySet)

    Creates a new DictionaryCompoundWordTokenFilter

    Declaration
    public DictionaryCompoundWordTokenFilter(LuceneVersion matchVersion, TokenStream input, CharArraySet dictionary)
    Parameters
    Type Name Description
    LuceneVersion matchVersion

    Lucene version to enable correct Unicode 4.0 behavior in the dictionaries if Version > 3.0. See CompoundWordTokenFilterBase for details.

    TokenStream input

    the Lucene.Net.Analysis.TokenStream to process

    CharArraySet dictionary

    the word dictionary to match against.

    DictionaryCompoundWordTokenFilter(LuceneVersion, TokenStream, CharArraySet, int, int, int, bool)

    Creates a new DictionaryCompoundWordTokenFilter

    Declaration
    public DictionaryCompoundWordTokenFilter(LuceneVersion matchVersion, TokenStream input, CharArraySet dictionary, int minWordSize, int minSubwordSize, int maxSubwordSize, bool onlyLongestMatch)
    Parameters
    Type Name Description
    LuceneVersion matchVersion

    Lucene version to enable correct Unicode 4.0 behavior in the dictionaries if Version > 3.0. See CompoundWordTokenFilterBase for details.

    TokenStream input

    the Lucene.Net.Analysis.TokenStream to process

    CharArraySet dictionary

    the word dictionary to match against.

    int minWordSize

    only words longer than this get processed

    int minSubwordSize

    only subwords longer than this get to the output stream

    int maxSubwordSize

    only subwords shorter than this get to the output stream

    bool onlyLongestMatch

    Add only the longest matching subword to the stream

    Methods

    Decompose()

    Decomposes the current m_termAtt and places CompoundWordTokenFilterBase.CompoundToken instances in the m_tokens list. The original token may not be placed in the list, as it is automatically passed through this filter.

    Declaration
    protected override void Decompose()
    Overrides
    CompoundWordTokenFilterBase.Decompose()

    Implements

    IDisposable
    Back to top Copyright © 2024 The Apache Software Foundation, Licensed under the Apache License, Version 2.0
    Apache Lucene.Net, Lucene.Net, Apache, the Apache feather logo, and the Apache Lucene.Net project logo are trademarks of The Apache Software Foundation.
    All other marks mentioned may be trademarks or registered trademarks of their respective owners.