Show / Hide Table of Contents

    Class HyphenationCompoundWordTokenFilter

    A TokenFilter that decomposes compound words found in many Germanic languages.

    "Donaudampfschiff" becomes Donau, dampf, schiff so that you can find "Donaudampfschiff" even when you only enter "schiff". It uses a hyphenation grammar and a word dictionary to achieve this.

    You must specify the required LuceneVersion compatibility when creating CompoundWordTokenFilterBase:

    • As of 3.1, CompoundWordTokenFilterBase correctly handles Unicode 4.0 supplementary characters in strings and char arrays provided as compound word dictionaries.

    Inheritance
    System.Object
    AttributeSource
    TokenStream
    TokenFilter
    CompoundWordTokenFilterBase
    HyphenationCompoundWordTokenFilter
    Implements
    IDisposable
    Inherited Members
    CompoundWordTokenFilterBase.DEFAULT_MIN_WORD_SIZE
    CompoundWordTokenFilterBase.DEFAULT_MIN_SUBWORD_SIZE
    CompoundWordTokenFilterBase.DEFAULT_MAX_SUBWORD_SIZE
    CompoundWordTokenFilterBase.m_matchVersion
    CompoundWordTokenFilterBase.m_dictionary
    CompoundWordTokenFilterBase.m_tokens
    CompoundWordTokenFilterBase.m_minWordSize
    CompoundWordTokenFilterBase.m_minSubwordSize
    CompoundWordTokenFilterBase.m_maxSubwordSize
    CompoundWordTokenFilterBase.m_onlyLongestMatch
    CompoundWordTokenFilterBase.m_termAtt
    CompoundWordTokenFilterBase.m_offsetAtt
    CompoundWordTokenFilterBase.IncrementToken()
    CompoundWordTokenFilterBase.Reset()
    TokenFilter.m_input
    TokenFilter.End()
    TokenFilter.Dispose(Boolean)
    TokenStream.Dispose()
    AttributeSource.GetAttributeFactory()
    AttributeSource.GetAttributeClassesEnumerator()
    AttributeSource.GetAttributeImplsEnumerator()
    AttributeSource.AddAttributeImpl(Attribute)
    AttributeSource.AddAttribute<T>()
    AttributeSource.HasAttributes
    AttributeSource.HasAttribute<T>()
    AttributeSource.GetAttribute<T>()
    AttributeSource.ClearAttributes()
    AttributeSource.CaptureState()
    AttributeSource.RestoreState(AttributeSource.State)
    AttributeSource.GetHashCode()
    AttributeSource.Equals(Object)
    AttributeSource.ReflectAsString(Boolean)
    AttributeSource.ReflectWith(IAttributeReflector)
    AttributeSource.CloneAttributes()
    AttributeSource.CopyTo(AttributeSource)
    AttributeSource.ToString()
    Namespace: Lucene.Net.Analysis.Compound
    Assembly: Lucene.Net.Analysis.Common.dll
    Syntax
    public class HyphenationCompoundWordTokenFilter : CompoundWordTokenFilterBase, IDisposable

    Constructors

    | Improve this Doc View Source

    HyphenationCompoundWordTokenFilter(LuceneVersion, TokenStream, HyphenationTree)

    Create a HyphenationCompoundWordTokenFilter with no dictionary.

    Calls HyphenationCompoundWordTokenFilter(LuceneVersion, TokenStream, HyphenationTree, Int32, Int32, Int32)

    Declaration
    public HyphenationCompoundWordTokenFilter(LuceneVersion matchVersion, TokenStream input, HyphenationTree hyphenator)
    Parameters
    Type Name Description
    LuceneVersion matchVersion
    TokenStream input
    HyphenationTree hyphenator
    | Improve this Doc View Source

    HyphenationCompoundWordTokenFilter(LuceneVersion, TokenStream, HyphenationTree, CharArraySet)

    Creates a new HyphenationCompoundWordTokenFilter instance.

    Declaration
    public HyphenationCompoundWordTokenFilter(LuceneVersion matchVersion, TokenStream input, HyphenationTree hyphenator, CharArraySet dictionary)
    Parameters
    Type Name Description
    LuceneVersion matchVersion

    Lucene version to enable correct Unicode 4.0 behavior in the dictionaries if Version > 3.0. See CompoundWordTokenFilterBase for details.

    TokenStream input

    the TokenStream to process

    HyphenationTree hyphenator

    the hyphenation pattern tree to use for hyphenation

    CharArraySet dictionary

    the word dictionary to match against.

    | Improve this Doc View Source

    HyphenationCompoundWordTokenFilter(LuceneVersion, TokenStream, HyphenationTree, CharArraySet, Int32, Int32, Int32, Boolean)

    Creates a new HyphenationCompoundWordTokenFilter instance.

    Declaration
    public HyphenationCompoundWordTokenFilter(LuceneVersion matchVersion, TokenStream input, HyphenationTree hyphenator, CharArraySet dictionary, int minWordSize, int minSubwordSize, int maxSubwordSize, bool onlyLongestMatch)
    Parameters
    Type Name Description
    LuceneVersion matchVersion

    Lucene version to enable correct Unicode 4.0 behavior in the dictionaries if Version > 3.0. See CompoundWordTokenFilterBase for details.

    TokenStream input

    the TokenStream to process

    HyphenationTree hyphenator

    the hyphenation pattern tree to use for hyphenation

    CharArraySet dictionary

    the word dictionary to match against.

    System.Int32 minWordSize

    only words longer than this get processed

    System.Int32 minSubwordSize

    only subwords longer than this get to the output stream

    System.Int32 maxSubwordSize

    only subwords shorter than this get to the output stream

    System.Boolean onlyLongestMatch

    Add only the longest matching subword to the stream

    | Improve this Doc View Source

    HyphenationCompoundWordTokenFilter(LuceneVersion, TokenStream, HyphenationTree, Int32, Int32, Int32)

    Create a HyphenationCompoundWordTokenFilter with no dictionary.

    Calls HyphenationCompoundWordTokenFilter(LuceneVersion, TokenStream, HyphenationTree, CharArraySet, Int32, Int32, Int32, Boolean)

    Declaration
    public HyphenationCompoundWordTokenFilter(LuceneVersion matchVersion, TokenStream input, HyphenationTree hyphenator, int minWordSize, int minSubwordSize, int maxSubwordSize)
    Parameters
    Type Name Description
    LuceneVersion matchVersion
    TokenStream input
    HyphenationTree hyphenator
    System.Int32 minWordSize
    System.Int32 minSubwordSize
    System.Int32 maxSubwordSize

    Methods

    | Improve this Doc View Source

    Decompose()

    Declaration
    protected override void Decompose()
    Overrides
    CompoundWordTokenFilterBase.Decompose()
    | Improve this Doc View Source

    GetHyphenationTree(FileInfo)

    Create a hyphenator tree

    Declaration
    public static HyphenationTree GetHyphenationTree(FileInfo hyphenationFile)
    Parameters
    Type Name Description
    FileInfo hyphenationFile

    the file of the XML grammar to load

    Returns
    Type Description
    HyphenationTree

    An object representing the hyphenation patterns

    | Improve this Doc View Source

    GetHyphenationTree(FileInfo, Encoding)

    Create a hyphenator tree

    Declaration
    public static HyphenationTree GetHyphenationTree(FileInfo hyphenationFile, Encoding encoding)
    Parameters
    Type Name Description
    FileInfo hyphenationFile

    the file of the XML grammar to load

    Encoding encoding

    The character encoding to use

    Returns
    Type Description
    HyphenationTree

    An object representing the hyphenation patterns

    | Improve this Doc View Source

    GetHyphenationTree(Stream)

    Create a hyphenator tree

    Declaration
    public static HyphenationTree GetHyphenationTree(Stream hyphenationSource)
    Parameters
    Type Name Description
    Stream hyphenationSource

    the InputSource pointing to the XML grammar

    Returns
    Type Description
    HyphenationTree

    An object representing the hyphenation patterns

    | Improve this Doc View Source

    GetHyphenationTree(Stream, Encoding)

    Create a hyphenator tree

    Declaration
    public static HyphenationTree GetHyphenationTree(Stream hyphenationSource, Encoding encoding)
    Parameters
    Type Name Description
    Stream hyphenationSource

    the InputSource pointing to the XML grammar

    Encoding encoding

    The character encoding to use

    Returns
    Type Description
    HyphenationTree

    An object representing the hyphenation patterns

    | Improve this Doc View Source

    GetHyphenationTree(String)

    Create a hyphenator tree

    Declaration
    public static HyphenationTree GetHyphenationTree(string hyphenationFilename)
    Parameters
    Type Name Description
    System.String hyphenationFilename

    the filename of the XML grammar to load

    Returns
    Type Description
    HyphenationTree

    An object representing the hyphenation patterns

    | Improve this Doc View Source

    GetHyphenationTree(String, Encoding)

    Create a hyphenator tree

    Declaration
    public static HyphenationTree GetHyphenationTree(string hyphenationFilename, Encoding encoding)
    Parameters
    Type Name Description
    System.String hyphenationFilename

    the filename of the XML grammar to load

    Encoding encoding

    The character encoding to use

    Returns
    Type Description
    HyphenationTree

    An object representing the hyphenation patterns

    Implements

    IDisposable
    • Improve this Doc
    • View Source
    Back to top Copyright © 2020 Licensed to the Apache Software Foundation (ASF)