Show / Hide Table of Contents

    Class RSLPStemmerBase

    Base class for stemmers that use a set of RSLP-like stemming steps.

    RSLP (Removedor de Sufixos da Lingua Portuguesa) is an algorithm designed originally for stemming the Portuguese language, described in the paper A Stemming Algorithm for the Portuguese Language, Orengo et. al.

    Since this time a plural-only modification (RSLP-S) as well as a modification for the Galician language have been implemented. This class parses a configuration file that describes RSLPStemmerBase.Steps, where each RSLPStemmerBase.Step contains a set of RSLPStemmerBase.Rules.

    The general rule format is:

    { "suffix", N, "replacement", { "exception1", "exception2", ...}}

    where:

    • suffix is the suffix to be removed (such as "inho").
    • N is the min stem size, where stem is defined as the candidate stem after removing the suffix (but before appending the replacement!)
    • replacement is an optimal string to append after removing the suffix. This can be the empty string.
    • exceptions is an optional list of exceptions, patterns that should not be stemmed. These patterns can be specified as whole word or suffix (ends-with) patterns, depending upon the exceptions format flag in the step header.

    A step is an ordered list of rules, with a structure in this format:

    { "name", N, B, { "cond1", "cond2", ... } ... rules ... };
    where:
    • name is a name for the step (such as "Plural").
    • N is the min word size. Words that are less than this length bypass the step completely, as an optimization. Note: N can be zero, in this case this implementation will automatically calculate the appropriate value from the underlying rules.
    • B is a "boolean" flag specifying how exceptions in the rules are matched. A value of 1 indicates whole-word pattern matching, a value of 0 indicates that exceptions are actually suffixes and should be matched with ends-with.
    • conds are an optional list of conditions to enter the step at all. If the list is non-empty, then a word must end with one of these conditions or it will bypass the step completely as an optimization.

    RSLP description

    This is a Lucene.NET INTERNAL API, use at your own risk
    Inheritance
    System.Object
    RSLPStemmerBase
    GalicianMinimalStemmer
    GalicianStemmer
    PortugueseMinimalStemmer
    PortugueseStemmer
    Namespace: Lucene.Net.Analysis.Pt
    Assembly: Lucene.Net.Analysis.Common.dll
    Syntax
    public abstract class RSLPStemmerBase : object

    Methods

    | Improve this Doc View Source

    Parse(Type, String)

    Parse a resource file into an RSLP stemmer description.

    Declaration
    protected static IDictionary<string, RSLPStemmerBase.Step> Parse(Type clazz, string resource)
    Parameters
    Type Name Description
    Type clazz
    System.String resource
    Returns
    Type Description
    IDictionary<System.String, RSLPStemmerBase.Step>

    a Map containing the named RSLPStemmerBase.Steps in this description.

    • Improve this Doc
    • View Source
    Back to top Copyright © 2020 Licensed to the Apache Software Foundation (ASF)