Show / Hide Table of Contents

    Namespace Egothor.Stemmer

    Egothor stemmer API.

    Classes

    Cell

    A Cell is a portion of a Trie.

    Compile

    The Compile class is used to compile a stemmer table.

    Diff

    The Diff object generates a patch string.

    A patch string is actually a command to a stemmer telling it how to reduce a word to its root. For example, to reduce the word teacher to its root teach the patch string Db would be generated. This command tells the stemmer to delete the last 2 characters from the word teacher to reach the stem (the patch commands are applied starting from the last character in order to save

    DiffIt

    The DiffIt class is a means generate patch commands from an already prepared stemmer table.

    Gener

    The Gener object helps in the discarding of nodes which break the reduction effort and defend the structure against large reductions.

    Lift

    The Lift class is a data structure that is a variation of a Patricia trie.

    Lift's raison d'etre is to implement reduction of the trie via the Lift-Up method., which makes the data structure less liable to overstemming.

    MultiTrie

    The MultiTrie is a Trie of Tries. It stores words and their associated patch commands. The MultiTrie handles patch commands individually (each command by itself).

    MultiTrie2

    The MultiTrie is a Trie of Tries.

    It stores words and their associated patch commands. The MultiTrie handles patch commands broken into their constituent parts, as a MultiTrie does, but the commands are delimited by the skip command.

    Optimizer

    The Optimizer class is a Trie that will be reduced (have empty rows removed).

    The reduction will be made by joining two rows where the first is a subset of the second.

    Optimizer2

    The Optimizer class is a Trie that will be reduced (have empty rows removed).

    This is the result of allowing a joining of rows when there is no collision between non-null values in the rows. Information loss, resulting in the stemmer not being able to recognize words (as in Optimizer), is curtailed, allowing the stemmer to recognize words for which the original trie was built. Use of this class allows the stemmer to be self-teaching.

    Reduce

    The Reduce object is used to remove gaps in a Trie which stores a dictionary.

    Row

    The Row class represents a row in a matrix representation of a Trie.

    Trie

    A Trie is used to store a dictionary of words and their stems.

    Actually, what is stored are words with their respective patch commands. A trie can be termed forward (keys read from left to right) or backward (keys read from right to left). This property will vary depending on the language for which a Trie is constructed.

    • Improve this Doc
    Back to top Copyright © 2020 Licensed to the Apache Software Foundation (ASF)