Namespace Egothor.Stemmer
Egothor stemmer API.
Classes
Cell
Compile
The Compile class is used to compile a stemmer table.
LUCENENET specific: In the Java implementation, this class' Main method was intended to be called from the command line. However, in .NET a method within a DLL can't be directly called from the command line so we provide a .NET tool, lucene-cli, with a command that maps to that method: analysis stempel-compile-stemsDiff
The Diff object generates a patch string.
A patch string is actually a command to a stemmer telling it how to reduce a word to its root. For example, to reduce the word teacher to its root teach the patch string Db would be generated. This command tells the stemmer to delete the last 2 characters from the word teacher to reach the stem (the patch commands are applied starting from the last character in order to save
DiffIt
The DiffIt class is a means generate patch commands from an already prepared stemmer table.
LUCENENET specific: In the Java implementation, this class' Main method was intended to be called from the command line. However, in .NET a method within a DLL can't be directly called from the command line so we provide a .NET tool, lucene-cli, with a command that maps to that method: analysis stempel-patch-stemsGener
The Gener object helps in the discarding of nodes which break the reduction effort and defend the structure against large reductions.
Lift
The Lift class is a data structure that is a variation of a Patricia trie.
Lift's raison d'etre is to implement reduction of the trie via the Lift-Up method., which makes the data structure less liable to overstemming.
MultiTrie
The MultiTrie is a Trie of Tries. It stores words and their associated patch commands. The MultiTrie handles patch commands individually (each command by itself).
MultiTrie2
The MultiTrie is a Trie of Tries.
It stores words and their associated patch commands. The MultiTrie handles patch commands broken into their constituent parts, as a MultiTrie does, but the commands are delimited by the skip command.
Optimizer
The Optimizer class is a Trie that will be reduced (have empty rows removed).
The reduction will be made by joining two rows where the first is a subset of the second.
Optimizer2
The Optimizer class is a Trie that will be reduced (have empty rows removed).
This is the result of allowing a joining of rows when there is no collision
between non-null
values in the rows. Information loss, resulting in
the stemmer not being able to recognize words (as in Optimizer), is
curtailed, allowing the stemmer to recognize words for which the original
trie was built. Use of this class allows the stemmer to be self-teaching.
Reduce
The Reduce object is used to remove gaps in a Trie which stores a dictionary.
Row
The Row class represents a row in a matrix representation of a Trie.
Trie
A Trie is used to store a dictionary of words and their stems.
Actually, what is stored are words with their respective patch commands. A trie can be termed forward (keys read from left to right) or backward (keys read from right to left). This property will vary depending on the language for which a Trie is constructed.