Fork me on GitHub
  • API

    Show / Hide Table of Contents

    Class DictionaryCompoundWordTokenFilter

    A Lucene.Net.Analysis.TokenFilter that decomposes compound words found in many Germanic languages.

    "Donaudampfschiff" becomes Donau, dampf, schiff so that you can find "Donaudampfschiff" even when you only enter "schiff". It uses a brute-force algorithm to achieve this.

    You must specify the required Lucene.Net.Util.LuceneVersion compatibility when creating CompoundWordTokenFilterBase:

    • As of 3.1, CompoundWordTokenFilterBase correctly handles Unicode 4.0 supplementary characters in strings and char arrays provided as compound word dictionaries.

    Inheritance
    System.Object
    Lucene.Net.Util.AttributeSource
    Lucene.Net.Analysis.TokenStream
    Lucene.Net.Analysis.TokenFilter
    CompoundWordTokenFilterBase
    DictionaryCompoundWordTokenFilter
    Implements
    System.IDisposable
    Inherited Members
    CompoundWordTokenFilterBase.DEFAULT_MIN_WORD_SIZE
    CompoundWordTokenFilterBase.DEFAULT_MIN_SUBWORD_SIZE
    CompoundWordTokenFilterBase.DEFAULT_MAX_SUBWORD_SIZE
    CompoundWordTokenFilterBase.m_matchVersion
    CompoundWordTokenFilterBase.m_dictionary
    CompoundWordTokenFilterBase.m_tokens
    CompoundWordTokenFilterBase.m_minWordSize
    CompoundWordTokenFilterBase.m_minSubwordSize
    CompoundWordTokenFilterBase.m_maxSubwordSize
    CompoundWordTokenFilterBase.m_onlyLongestMatch
    CompoundWordTokenFilterBase.m_termAtt
    CompoundWordTokenFilterBase.m_offsetAtt
    CompoundWordTokenFilterBase.IncrementToken()
    CompoundWordTokenFilterBase.Reset()
    Lucene.Net.Analysis.TokenFilter.m_input
    Lucene.Net.Analysis.TokenFilter.End()
    TokenFilter.Dispose(Boolean)
    Lucene.Net.Analysis.TokenStream.Dispose()
    Lucene.Net.Util.AttributeSource.GetAttributeFactory()
    Lucene.Net.Util.AttributeSource.GetAttributeClassesEnumerator()
    Lucene.Net.Util.AttributeSource.GetAttributeImplsEnumerator()
    Lucene.Net.Util.AttributeSource.AddAttributeImpl(Lucene.Net.Util.Attribute)
    Lucene.Net.Util.AttributeSource.AddAttribute<T>()
    Lucene.Net.Util.AttributeSource.HasAttributes
    Lucene.Net.Util.AttributeSource.HasAttribute<T>()
    Lucene.Net.Util.AttributeSource.GetAttribute<T>()
    Lucene.Net.Util.AttributeSource.ClearAttributes()
    Lucene.Net.Util.AttributeSource.CaptureState()
    Lucene.Net.Util.AttributeSource.RestoreState(Lucene.Net.Util.AttributeSource.State)
    Lucene.Net.Util.AttributeSource.GetHashCode()
    AttributeSource.Equals(Object)
    AttributeSource.ReflectAsString(Boolean)
    Lucene.Net.Util.AttributeSource.ReflectWith(Lucene.Net.Util.IAttributeReflector)
    Lucene.Net.Util.AttributeSource.CloneAttributes()
    Lucene.Net.Util.AttributeSource.CopyTo(Lucene.Net.Util.AttributeSource)
    Lucene.Net.Util.AttributeSource.ToString()
    System.Object.Equals(System.Object, System.Object)
    System.Object.GetType()
    System.Object.MemberwiseClone()
    System.Object.ReferenceEquals(System.Object, System.Object)
    Namespace: Lucene.Net.Analysis.Compound
    Assembly: Lucene.Net.Analysis.Common.dll
    Syntax
    public class DictionaryCompoundWordTokenFilter : CompoundWordTokenFilterBase, IDisposable

    Constructors

    | Improve this Doc View Source

    DictionaryCompoundWordTokenFilter(LuceneVersion, TokenStream, CharArraySet)

    Creates a new DictionaryCompoundWordTokenFilter

    Declaration
    public DictionaryCompoundWordTokenFilter(LuceneVersion matchVersion, TokenStream input, CharArraySet dictionary)
    Parameters
    Type Name Description
    Lucene.Net.Util.LuceneVersion matchVersion

    Lucene version to enable correct Unicode 4.0 behavior in the dictionaries if Version > 3.0. See CompoundWordTokenFilterBase for details.

    Lucene.Net.Analysis.TokenStream input

    the Lucene.Net.Analysis.TokenStream to process

    CharArraySet dictionary

    the word dictionary to match against.

    | Improve this Doc View Source

    DictionaryCompoundWordTokenFilter(LuceneVersion, TokenStream, CharArraySet, Int32, Int32, Int32, Boolean)

    Creates a new DictionaryCompoundWordTokenFilter

    Declaration
    public DictionaryCompoundWordTokenFilter(LuceneVersion matchVersion, TokenStream input, CharArraySet dictionary, int minWordSize, int minSubwordSize, int maxSubwordSize, bool onlyLongestMatch)
    Parameters
    Type Name Description
    Lucene.Net.Util.LuceneVersion matchVersion

    Lucene version to enable correct Unicode 4.0 behavior in the dictionaries if Version > 3.0. See CompoundWordTokenFilterBase for details.

    Lucene.Net.Analysis.TokenStream input

    the Lucene.Net.Analysis.TokenStream to process

    CharArraySet dictionary

    the word dictionary to match against.

    System.Int32 minWordSize

    only words longer than this get processed

    System.Int32 minSubwordSize

    only subwords longer than this get to the output stream

    System.Int32 maxSubwordSize

    only subwords shorter than this get to the output stream

    System.Boolean onlyLongestMatch

    Add only the longest matching subword to the stream

    Methods

    | Improve this Doc View Source

    Decompose()

    Declaration
    protected override void Decompose()
    Overrides
    CompoundWordTokenFilterBase.Decompose()

    Implements

    System.IDisposable
    • Improve this Doc
    • View Source
    Back to top Copyright © 2020 The Apache Software Foundation, Licensed under the Apache License, Version 2.0
    Apache Lucene.Net, Lucene.Net, Apache, the Apache feather logo, and the Apache Lucene.Net project logo are trademarks of The Apache Software Foundation.
    All other marks mentioned may be trademarks or registered trademarks of their respective owners.