Fork me on GitHub
  • API

    Show / Hide Table of Contents

    Class HyphenationCompoundWordTokenFilter

    A Lucene.Net.Analysis.TokenFilter that decomposes compound words found in many Germanic languages.

    "Donaudampfschiff" becomes Donau, dampf, schiff so that you can find "Donaudampfschiff" even when you only enter "schiff". It uses a hyphenation grammar and a word dictionary to achieve this.

    You must specify the required Lucene.Net.Util.LuceneVersion compatibility when creating CompoundWordTokenFilterBase:

    • As of 3.1, CompoundWordTokenFilterBase correctly handles Unicode 4.0 supplementary characters in strings and char arrays provided as compound word dictionaries.

    Inheritance
    System.Object
    Lucene.Net.Util.AttributeSource
    Lucene.Net.Analysis.TokenStream
    Lucene.Net.Analysis.TokenFilter
    CompoundWordTokenFilterBase
    HyphenationCompoundWordTokenFilter
    Implements
    System.IDisposable
    Inherited Members
    CompoundWordTokenFilterBase.DEFAULT_MIN_WORD_SIZE
    CompoundWordTokenFilterBase.DEFAULT_MIN_SUBWORD_SIZE
    CompoundWordTokenFilterBase.DEFAULT_MAX_SUBWORD_SIZE
    CompoundWordTokenFilterBase.m_matchVersion
    CompoundWordTokenFilterBase.m_dictionary
    CompoundWordTokenFilterBase.m_tokens
    CompoundWordTokenFilterBase.m_minWordSize
    CompoundWordTokenFilterBase.m_minSubwordSize
    CompoundWordTokenFilterBase.m_maxSubwordSize
    CompoundWordTokenFilterBase.m_onlyLongestMatch
    CompoundWordTokenFilterBase.m_termAtt
    CompoundWordTokenFilterBase.m_offsetAtt
    CompoundWordTokenFilterBase.IncrementToken()
    CompoundWordTokenFilterBase.Reset()
    Lucene.Net.Analysis.TokenFilter.m_input
    Lucene.Net.Analysis.TokenFilter.End()
    TokenFilter.Dispose(Boolean)
    Lucene.Net.Analysis.TokenStream.Dispose()
    Lucene.Net.Util.AttributeSource.GetAttributeFactory()
    Lucene.Net.Util.AttributeSource.GetAttributeClassesEnumerator()
    Lucene.Net.Util.AttributeSource.GetAttributeImplsEnumerator()
    Lucene.Net.Util.AttributeSource.AddAttributeImpl(Lucene.Net.Util.Attribute)
    Lucene.Net.Util.AttributeSource.AddAttribute<T>()
    Lucene.Net.Util.AttributeSource.HasAttributes
    Lucene.Net.Util.AttributeSource.HasAttribute<T>()
    Lucene.Net.Util.AttributeSource.GetAttribute<T>()
    Lucene.Net.Util.AttributeSource.ClearAttributes()
    Lucene.Net.Util.AttributeSource.CaptureState()
    Lucene.Net.Util.AttributeSource.RestoreState(Lucene.Net.Util.AttributeSource.State)
    Lucene.Net.Util.AttributeSource.GetHashCode()
    AttributeSource.Equals(Object)
    AttributeSource.ReflectAsString(Boolean)
    Lucene.Net.Util.AttributeSource.ReflectWith(Lucene.Net.Util.IAttributeReflector)
    Lucene.Net.Util.AttributeSource.CloneAttributes()
    Lucene.Net.Util.AttributeSource.CopyTo(Lucene.Net.Util.AttributeSource)
    Lucene.Net.Util.AttributeSource.ToString()
    System.Object.Equals(System.Object, System.Object)
    System.Object.GetType()
    System.Object.MemberwiseClone()
    System.Object.ReferenceEquals(System.Object, System.Object)
    Namespace: Lucene.Net.Analysis.Compound
    Assembly: Lucene.Net.Analysis.Common.dll
    Syntax
    public class HyphenationCompoundWordTokenFilter : CompoundWordTokenFilterBase, IDisposable

    Constructors

    | Improve this Doc View Source

    HyphenationCompoundWordTokenFilter(LuceneVersion, TokenStream, HyphenationTree)

    Create a HyphenationCompoundWordTokenFilter with no dictionary.

    Calls HyphenationCompoundWordTokenFilter(LuceneVersion, TokenStream, HyphenationTree, Int32, Int32, Int32)

    Declaration
    public HyphenationCompoundWordTokenFilter(LuceneVersion matchVersion, TokenStream input, HyphenationTree hyphenator)
    Parameters
    Type Name Description
    Lucene.Net.Util.LuceneVersion matchVersion
    Lucene.Net.Analysis.TokenStream input
    HyphenationTree hyphenator
    | Improve this Doc View Source

    HyphenationCompoundWordTokenFilter(LuceneVersion, TokenStream, HyphenationTree, CharArraySet)

    Creates a new HyphenationCompoundWordTokenFilter instance.

    Declaration
    public HyphenationCompoundWordTokenFilter(LuceneVersion matchVersion, TokenStream input, HyphenationTree hyphenator, CharArraySet dictionary)
    Parameters
    Type Name Description
    Lucene.Net.Util.LuceneVersion matchVersion

    Lucene version to enable correct Unicode 4.0 behavior in the dictionaries if Version > 3.0. See CompoundWordTokenFilterBase for details.

    Lucene.Net.Analysis.TokenStream input

    the Lucene.Net.Analysis.TokenStream to process

    HyphenationTree hyphenator

    the hyphenation pattern tree to use for hyphenation

    CharArraySet dictionary

    the word dictionary to match against.

    | Improve this Doc View Source

    HyphenationCompoundWordTokenFilter(LuceneVersion, TokenStream, HyphenationTree, CharArraySet, Int32, Int32, Int32, Boolean)

    Creates a new HyphenationCompoundWordTokenFilter instance.

    Declaration
    public HyphenationCompoundWordTokenFilter(LuceneVersion matchVersion, TokenStream input, HyphenationTree hyphenator, CharArraySet dictionary, int minWordSize, int minSubwordSize, int maxSubwordSize, bool onlyLongestMatch)
    Parameters
    Type Name Description
    Lucene.Net.Util.LuceneVersion matchVersion

    Lucene version to enable correct Unicode 4.0 behavior in the dictionaries if Version > 3.0. See CompoundWordTokenFilterBase for details.

    Lucene.Net.Analysis.TokenStream input

    the Lucene.Net.Analysis.TokenStream to process

    HyphenationTree hyphenator

    the hyphenation pattern tree to use for hyphenation

    CharArraySet dictionary

    the word dictionary to match against.

    System.Int32 minWordSize

    only words longer than this get processed

    System.Int32 minSubwordSize

    only subwords longer than this get to the output stream

    System.Int32 maxSubwordSize

    only subwords shorter than this get to the output stream

    System.Boolean onlyLongestMatch

    Add only the longest matching subword to the stream

    | Improve this Doc View Source

    HyphenationCompoundWordTokenFilter(LuceneVersion, TokenStream, HyphenationTree, Int32, Int32, Int32)

    Create a HyphenationCompoundWordTokenFilter with no dictionary.

    Calls HyphenationCompoundWordTokenFilter(LuceneVersion, TokenStream, HyphenationTree, CharArraySet, Int32, Int32, Int32, Boolean)

    Declaration
    public HyphenationCompoundWordTokenFilter(LuceneVersion matchVersion, TokenStream input, HyphenationTree hyphenator, int minWordSize, int minSubwordSize, int maxSubwordSize)
    Parameters
    Type Name Description
    Lucene.Net.Util.LuceneVersion matchVersion
    Lucene.Net.Analysis.TokenStream input
    HyphenationTree hyphenator
    System.Int32 minWordSize
    System.Int32 minSubwordSize
    System.Int32 maxSubwordSize

    Methods

    | Improve this Doc View Source

    Decompose()

    Declaration
    protected override void Decompose()
    Overrides
    CompoundWordTokenFilterBase.Decompose()
    | Improve this Doc View Source

    GetHyphenationTree(FileInfo)

    Create a hyphenator tree

    Declaration
    public static HyphenationTree GetHyphenationTree(FileInfo hyphenationFile)
    Parameters
    Type Name Description
    System.IO.FileInfo hyphenationFile

    the file of the XML grammar to load

    Returns
    Type Description
    HyphenationTree

    An object representing the hyphenation patterns

    Exceptions
    Type Condition
    System.IO.IOException

    If there is a low-level I/O error.

    | Improve this Doc View Source

    GetHyphenationTree(FileInfo, Encoding)

    Create a hyphenator tree

    Declaration
    public static HyphenationTree GetHyphenationTree(FileInfo hyphenationFile, Encoding encoding)
    Parameters
    Type Name Description
    System.IO.FileInfo hyphenationFile

    the file of the XML grammar to load

    System.Text.Encoding encoding

    The character encoding to use

    Returns
    Type Description
    HyphenationTree

    An object representing the hyphenation patterns

    Exceptions
    Type Condition
    System.IO.IOException

    If there is a low-level I/O error.

    | Improve this Doc View Source

    GetHyphenationTree(Stream)

    Create a hyphenator tree

    Declaration
    public static HyphenationTree GetHyphenationTree(Stream hyphenationSource)
    Parameters
    Type Name Description
    System.IO.Stream hyphenationSource

    the InputSource pointing to the XML grammar

    Returns
    Type Description
    HyphenationTree

    An object representing the hyphenation patterns

    Exceptions
    Type Condition
    System.IO.IOException

    If there is a low-level I/O error.

    | Improve this Doc View Source

    GetHyphenationTree(Stream, Encoding)

    Create a hyphenator tree

    Declaration
    public static HyphenationTree GetHyphenationTree(Stream hyphenationSource, Encoding encoding)
    Parameters
    Type Name Description
    System.IO.Stream hyphenationSource

    the InputSource pointing to the XML grammar

    System.Text.Encoding encoding

    The character encoding to use

    Returns
    Type Description
    HyphenationTree

    An object representing the hyphenation patterns

    Exceptions
    Type Condition
    System.IO.IOException

    If there is a low-level I/O error.

    | Improve this Doc View Source

    GetHyphenationTree(String)

    Create a hyphenator tree

    Declaration
    public static HyphenationTree GetHyphenationTree(string hyphenationFilename)
    Parameters
    Type Name Description
    System.String hyphenationFilename

    the filename of the XML grammar to load

    Returns
    Type Description
    HyphenationTree

    An object representing the hyphenation patterns

    Exceptions
    Type Condition
    System.IO.IOException

    If there is a low-level I/O error.

    | Improve this Doc View Source

    GetHyphenationTree(String, Encoding)

    Create a hyphenator tree

    Declaration
    public static HyphenationTree GetHyphenationTree(string hyphenationFilename, Encoding encoding)
    Parameters
    Type Name Description
    System.String hyphenationFilename

    the filename of the XML grammar to load

    System.Text.Encoding encoding

    The character encoding to use

    Returns
    Type Description
    HyphenationTree

    An object representing the hyphenation patterns

    Exceptions
    Type Condition
    System.IO.IOException

    If there is a low-level I/O error.

    Implements

    System.IDisposable
    • Improve this Doc
    • View Source
    Back to top Copyright © 2020 The Apache Software Foundation, Licensed under the Apache License, Version 2.0
    Apache Lucene.Net, Lucene.Net, Apache, the Apache feather logo, and the Apache Lucene.Net project logo are trademarks of The Apache Software Foundation.
    All other marks mentioned may be trademarks or registered trademarks of their respective owners.