Fork me on GitHub
  • API

    Show / Hide Table of Contents

    Class HyphenationCompoundWordTokenFilter

    A Lucene.Net.Analysis.TokenFilter that decomposes compound words found in many Germanic languages.

    "Donaudampfschiff" becomes Donau, dampf, schiff so that you can find "Donaudampfschiff" even when you only enter "schiff". It uses a hyphenation grammar and a word dictionary to achieve this.

    You must specify the required Lucene.Net.Util.LuceneVersion compatibility when creating CompoundWordTokenFilterBase:

    • As of 3.1, CompoundWordTokenFilterBase correctly handles Unicode 4.0 supplementary characters in strings and char arrays provided as compound word dictionaries.
    Inheritance
    object
    AttributeSource
    TokenStream
    TokenFilter
    CompoundWordTokenFilterBase
    HyphenationCompoundWordTokenFilter
    Implements
    IDisposable
    Inherited Members
    CompoundWordTokenFilterBase.DEFAULT_MIN_WORD_SIZE
    CompoundWordTokenFilterBase.DEFAULT_MIN_SUBWORD_SIZE
    CompoundWordTokenFilterBase.DEFAULT_MAX_SUBWORD_SIZE
    CompoundWordTokenFilterBase.m_matchVersion
    CompoundWordTokenFilterBase.m_dictionary
    CompoundWordTokenFilterBase.m_tokens
    CompoundWordTokenFilterBase.m_minWordSize
    CompoundWordTokenFilterBase.m_minSubwordSize
    CompoundWordTokenFilterBase.m_maxSubwordSize
    CompoundWordTokenFilterBase.m_onlyLongestMatch
    CompoundWordTokenFilterBase.m_termAtt
    CompoundWordTokenFilterBase.m_offsetAtt
    CompoundWordTokenFilterBase.IncrementToken()
    CompoundWordTokenFilterBase.Reset()
    TokenFilter.m_input
    TokenFilter.End()
    TokenFilter.Dispose(bool)
    TokenStream.Dispose()
    AttributeSource.GetAttributeFactory()
    AttributeSource.GetAttributeClassesEnumerator()
    AttributeSource.GetAttributeImplsEnumerator()
    AttributeSource.AddAttributeImpl(Attribute)
    AttributeSource.AddAttribute<T>()
    AttributeSource.HasAttributes
    AttributeSource.HasAttribute<T>()
    AttributeSource.GetAttribute<T>()
    AttributeSource.ClearAttributes()
    AttributeSource.CaptureState()
    AttributeSource.RestoreState(AttributeSource.State)
    AttributeSource.GetHashCode()
    AttributeSource.Equals(object)
    AttributeSource.ReflectAsString(bool)
    AttributeSource.ReflectWith(IAttributeReflector)
    AttributeSource.CloneAttributes()
    AttributeSource.CopyTo(AttributeSource)
    AttributeSource.ToString()
    object.Equals(object, object)
    object.GetType()
    object.MemberwiseClone()
    object.ReferenceEquals(object, object)
    Namespace: Lucene.Net.Analysis.Compound
    Assembly: Lucene.Net.Analysis.Common.dll
    Syntax
    public class HyphenationCompoundWordTokenFilter : CompoundWordTokenFilterBase, IDisposable

    Constructors

    HyphenationCompoundWordTokenFilter(LuceneVersion, TokenStream, HyphenationTree)

    Create a HyphenationCompoundWordTokenFilter with no dictionary.

    Calls HyphenationCompoundWordTokenFilter(LuceneVersion, TokenStream, HyphenationTree, int, int, int)

    Declaration
    public HyphenationCompoundWordTokenFilter(LuceneVersion matchVersion, TokenStream input, HyphenationTree hyphenator)
    Parameters
    Type Name Description
    LuceneVersion matchVersion
    TokenStream input
    HyphenationTree hyphenator

    HyphenationCompoundWordTokenFilter(LuceneVersion, TokenStream, HyphenationTree, CharArraySet)

    Creates a new HyphenationCompoundWordTokenFilter instance.

    Declaration
    public HyphenationCompoundWordTokenFilter(LuceneVersion matchVersion, TokenStream input, HyphenationTree hyphenator, CharArraySet dictionary)
    Parameters
    Type Name Description
    LuceneVersion matchVersion

    Lucene version to enable correct Unicode 4.0 behavior in the dictionaries if Version > 3.0. See CompoundWordTokenFilterBase for details.

    TokenStream input

    the Lucene.Net.Analysis.TokenStream to process

    HyphenationTree hyphenator

    the hyphenation pattern tree to use for hyphenation

    CharArraySet dictionary

    the word dictionary to match against.

    HyphenationCompoundWordTokenFilter(LuceneVersion, TokenStream, HyphenationTree, CharArraySet, int, int, int, bool)

    Creates a new HyphenationCompoundWordTokenFilter instance.

    Declaration
    public HyphenationCompoundWordTokenFilter(LuceneVersion matchVersion, TokenStream input, HyphenationTree hyphenator, CharArraySet dictionary, int minWordSize, int minSubwordSize, int maxSubwordSize, bool onlyLongestMatch)
    Parameters
    Type Name Description
    LuceneVersion matchVersion

    Lucene version to enable correct Unicode 4.0 behavior in the dictionaries if Version > 3.0. See CompoundWordTokenFilterBase for details.

    TokenStream input

    the Lucene.Net.Analysis.TokenStream to process

    HyphenationTree hyphenator

    the hyphenation pattern tree to use for hyphenation

    CharArraySet dictionary

    the word dictionary to match against.

    int minWordSize

    only words longer than this get processed

    int minSubwordSize

    only subwords longer than this get to the output stream

    int maxSubwordSize

    only subwords shorter than this get to the output stream

    bool onlyLongestMatch

    Add only the longest matching subword to the stream

    HyphenationCompoundWordTokenFilter(LuceneVersion, TokenStream, HyphenationTree, int, int, int)

    Create a HyphenationCompoundWordTokenFilter with no dictionary.

    Calls HyphenationCompoundWordTokenFilter(LuceneVersion, TokenStream, HyphenationTree, CharArraySet, int, int, int, bool)

    Declaration
    public HyphenationCompoundWordTokenFilter(LuceneVersion matchVersion, TokenStream input, HyphenationTree hyphenator, int minWordSize, int minSubwordSize, int maxSubwordSize)
    Parameters
    Type Name Description
    LuceneVersion matchVersion
    TokenStream input
    HyphenationTree hyphenator
    int minWordSize
    int minSubwordSize
    int maxSubwordSize

    Methods

    Decompose()

    Decomposes the current m_termAtt and places CompoundWordTokenFilterBase.CompoundToken instances in the m_tokens list. The original token may not be placed in the list, as it is automatically passed through this filter.

    Declaration
    protected override void Decompose()
    Overrides
    CompoundWordTokenFilterBase.Decompose()

    GetHyphenationTree(FileInfo)

    Create a hyphenator tree

    Declaration
    public static HyphenationTree GetHyphenationTree(FileInfo hyphenationFile)
    Parameters
    Type Name Description
    FileInfo hyphenationFile

    the file of the XML grammar to load

    Returns
    Type Description
    HyphenationTree

    An object representing the hyphenation patterns

    Exceptions
    Type Condition
    IOException

    If there is a low-level I/O error.

    GetHyphenationTree(FileInfo, Encoding)

    Create a hyphenator tree

    Declaration
    public static HyphenationTree GetHyphenationTree(FileInfo hyphenationFile, Encoding encoding)
    Parameters
    Type Name Description
    FileInfo hyphenationFile

    the file of the XML grammar to load

    Encoding encoding

    The character encoding to use

    Returns
    Type Description
    HyphenationTree

    An object representing the hyphenation patterns

    Exceptions
    Type Condition
    IOException

    If there is a low-level I/O error.

    GetHyphenationTree(Stream)

    Create a hyphenator tree

    Declaration
    public static HyphenationTree GetHyphenationTree(Stream hyphenationSource)
    Parameters
    Type Name Description
    Stream hyphenationSource

    the InputSource pointing to the XML grammar

    Returns
    Type Description
    HyphenationTree

    An object representing the hyphenation patterns

    Exceptions
    Type Condition
    IOException

    If there is a low-level I/O error.

    GetHyphenationTree(Stream, Encoding)

    Create a hyphenator tree

    Declaration
    public static HyphenationTree GetHyphenationTree(Stream hyphenationSource, Encoding encoding)
    Parameters
    Type Name Description
    Stream hyphenationSource

    the InputSource pointing to the XML grammar

    Encoding encoding

    The character encoding to use

    Returns
    Type Description
    HyphenationTree

    An object representing the hyphenation patterns

    Exceptions
    Type Condition
    IOException

    If there is a low-level I/O error.

    GetHyphenationTree(string)

    Create a hyphenator tree

    Declaration
    public static HyphenationTree GetHyphenationTree(string hyphenationFilename)
    Parameters
    Type Name Description
    string hyphenationFilename

    the filename of the XML grammar to load

    Returns
    Type Description
    HyphenationTree

    An object representing the hyphenation patterns

    Exceptions
    Type Condition
    IOException

    If there is a low-level I/O error.

    GetHyphenationTree(string, Encoding)

    Create a hyphenator tree

    Declaration
    public static HyphenationTree GetHyphenationTree(string hyphenationFilename, Encoding encoding)
    Parameters
    Type Name Description
    string hyphenationFilename

    the filename of the XML grammar to load

    Encoding encoding

    The character encoding to use

    Returns
    Type Description
    HyphenationTree

    An object representing the hyphenation patterns

    Exceptions
    Type Condition
    IOException

    If there is a low-level I/O error.

    Implements

    IDisposable
    Back to top Copyright © 2024 The Apache Software Foundation, Licensed under the Apache License, Version 2.0
    Apache Lucene.Net, Lucene.Net, Apache, the Apache feather logo, and the Apache Lucene.Net project logo are trademarks of The Apache Software Foundation.
    All other marks mentioned may be trademarks or registered trademarks of their respective owners.