Namespace Lucene.Net.Analysis.Pt
Analyzer for Portuguese.
Classes
PortugueseAnalyzer
Lucene.Net.Analysis.Analyzer for Portuguese.
You must specify the required Lucene.Net.Util.LuceneVersion compatibility when creating PortugueseAnalyzer:
- As of 3.6, PortugueseLightStemFilter is used for less aggressive stemming.
PortugueseLightStemFilter
A Lucene.Net.Analysis.TokenFilter that applies PortugueseLightStemmer to stem Portuguese words.
To prevent terms from being stemmed use an instance of SetKeywordMarkerFilter or a custom Lucene.Net.Analysis.TokenFilter that sets the KeywordAttribute before this Lucene.Net.Analysis.TokenStream.
PortugueseLightStemFilterFactory
Factory for PortugueseLightStemFilter.
<fieldType name="text_ptlgtstem" class="solr.TextField" positionIncrementGap="100">
  <analyzer>
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.PortugueseLightStemFilterFactory"/>
  </analyzer>
</fieldType>PortugueseLightStemmer
Light Stemmer for Portuguese
This stemmer implements the "UniNE" algorithm in:
Light Stemming Approaches for the French, Portuguese, German and Hungarian Languages
Jacques Savoy
PortugueseMinimalStemFilter
A Lucene.Net.Analysis.TokenFilter that applies PortugueseMinimalStemmer to stem Portuguese words.
To prevent terms from being stemmed use an instance of SetKeywordMarkerFilter or a custom Lucene.Net.Analysis.TokenFilter that sets the KeywordAttribute before this Lucene.Net.Analysis.TokenStream.
PortugueseMinimalStemFilterFactory
Factory for PortugueseMinimalStemFilter.
<fieldType name="text_ptminstem" class="solr.TextField" positionIncrementGap="100">
  <analyzer>
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.PortugueseMinimalStemFilterFactory"/>
  </analyzer>
</fieldType>PortugueseMinimalStemmer
Minimal Stemmer for Portuguese
This follows the "RSLP-S" algorithm presented in:
A study on the Use of Stemming for Monolingual Ad-Hoc Portuguese
Information Retrieval (Orengo, et al)
which is just the plural reduction step of the RSLP
algorithm from A Stemming Algorithm for the Portuguese Language,
Orengo et al.
PortugueseStemFilter
A Lucene.Net.Analysis.TokenFilter that applies PortugueseStemmer to stem Portuguese words.
To prevent terms from being stemmed use an instance of SetKeywordMarkerFilter or a custom Lucene.Net.Analysis.TokenFilter that sets the KeywordAttribute before this Lucene.Net.Analysis.TokenStream.
PortugueseStemFilterFactory
Factory for PortugueseStemFilter.
<fieldType name="text_ptstem" class="solr.TextField" positionIncrementGap="100">
  <analyzer>
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.PortugueseStemFilterFactory"/>
  </analyzer>
</fieldType>PortugueseStemmer
Portuguese stemmer implementing the RSLP (Removedor de Sufixos da Lingua Portuguesa) algorithm. This is sometimes also referred to as the Orengo stemmer.
RSLPStemmerBase
Base class for stemmers that use a set of RSLP-like stemming steps.
RSLP (Removedor de Sufixos da Lingua Portuguesa) is an algorithm designed
originally for stemming the Portuguese language, described in the paper
A Stemming Algorithm for the Portuguese Language, Orengo et. al.
Since this time a plural-only modification (RSLP-S) as well as a modification for the Galician language have been implemented. This class parses a configuration file that describes RSLPStemmerBase.Steps, where each RSLPStemmerBase.Step contains a set of RSLPStemmerBase.Rules.
The general rule format is:
{ "suffix", N, "replacement", { "exception1", "exception2", ...}}where:
- suffixis the suffix to be removed (such as "inho").
- Nis the min stem size, where stem is defined as the candidate stem after removing the suffix (but before appending the replacement!)
- replacementis an optimal string to append after removing the suffix. This can be the empty string.
- exceptionsis an optional list of exceptions, patterns that should not be stemmed. These patterns can be specified as whole word or suffix (ends-with) patterns, depending upon the exceptions format flag in the step header.
A step is an ordered list of rules, with a structure in this format:
{ "name", N, B, { "cond1", "cond2", ... } ... rules ... };where:
- nameis a name for the step (such as "Plural").
- Nis the min word size. Words that are less than this length bypass the step completely, as an optimization. Note: N can be zero, in this case this implementation will automatically calculate the appropriate value from the underlying rules.
- Bis a "boolean" flag specifying how exceptions in the rules are matched. A value of 1 indicates whole-word pattern matching, a value of 0 indicates that exceptions are actually suffixes and should be matched with ends-with.
- condsare an optional list of conditions to enter the step at all. If the list is non-empty, then a word must end with one of these conditions or it will bypass the step completely as an optimization.
RSLPStemmerBase.Rule
A basic rule, with no exceptions.
RSLPStemmerBase.RuleWithSetExceptions
A rule with a set of whole-word exceptions.
RSLPStemmerBase.RuleWithSuffixExceptions
A rule with a set of exceptional suffixes.
RSLPStemmerBase.Step
A step containing a list of rules.