Namespace Lucene.Net.Analysis.En
Analyzer for English.
Classes
EnglishAnalyzer
Analyzer for English.
EnglishMinimalStemFilter
A TokenFilter that applies EnglishMinimalStemmer to stem English words.
To prevent terms from being stemmed use an instance of SetKeywordMarkerFilter or a custom TokenFilter that sets the KeywordAttribute before this TokenStream.
EnglishMinimalStemFilterFactory
Factory for EnglishMinimalStemFilter.
<fieldType name="text_enminstem" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishMinimalStemFilterFactory"/>
</analyzer>
</fieldType>
EnglishMinimalStemmer
Minimal plural stemmer for English.
This stemmer implements the "S-Stemmer" from
How Effective Is Suffixing?
Donna Harman.
EnglishPossessiveFilter
TokenFilter that removes possessives (trailing 's) from words.
You must specify the required LuceneVersion compatibility when creating EnglishPossessiveFilter:
- As of 3.6, U+2019 RIGHT SINGLE QUOTATION MARK and U+FF07 FULLWIDTH APOSTROPHE are also treated as quotation marks.
EnglishPossessiveFilterFactory
Factory for EnglishPossessiveFilter.
<fieldType name="text_enpossessive" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishPossessiveFilterFactory"/>
</analyzer>
</fieldType>
KStemFilter
A high-performance kstem filter for english.
See "Viewing Morphology as an Inference Process" (Krovetz, R., Proceedings of the Sixteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 191-203, 1993).
All terms must already be lowercased for this filter to work correctly.
Note: This filter is aware of the KeywordAttribute. To prevent certain terms from being passed to the stemmer IsKeyword should be set to
true
in a previous TokenStream.
Note: For including the original term as well as the stemmed version, see
KeywordRepeatFilterFactory
KStemFilterFactory
Factory for KStemFilter.
<fieldType name="text_kstem" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.KStemFilterFactory"/>
</analyzer>
</fieldType>
KStemmer
This class implements the Kstem algorithm
PorterStemFilter
Transforms the token stream as per the Porter stemming algorithm.
Note: the input to the stemming filter must already be in lower case, so you will need to use LowerCaseFilter or LowerCaseTokenizer farther down the Tokenizer chain in order for this to work properly!
To use this filter with other analyzers, you'll want to write an Analyzer class that sets up the TokenStream chain as you want it. To use this with LowerCaseTokenizer, for example, you'd write an analyzer like this:
class MyAnalyzer : Analyzer {
protected override TokenStreamComponents CreateComponents(string fieldName, TextReader reader) {
Tokenizer source = new LowerCaseTokenizer(version, reader);
return new TokenStreamComponents(source, new PorterStemFilter(source));
}
}
Note: This filter is aware of the KeywordAttribute. To prevent certain terms from being passed to the stemmer IsKeyword should be set to
true
in a previous TokenStream.
Note: For including the original term as well as the stemmed version, see
KeywordRepeatFilterFactory
PorterStemFilterFactory
Factory for PorterStemFilter.
<fieldType name="text_porterstem" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.PorterStemFilterFactory"/>
</analyzer>
</fieldType>