Namespace Lucene.Net.Analysis.En

Analyzer for English.

Classes

EnglishMinimalStemFilter

A TokenFilter that applies EnglishMinimalStemmer to stem English words.

To prevent terms from being stemmed use an instance of SetKeywordMarkerFilter or a custom TokenFilter that sets the KeywordAttribute before this TokenStream.

EnglishMinimalStemFilterFactory

Factory for EnglishMinimalStemFilter.

<fieldType name="text_enminstem" class="solr.TextField" positionIncrementGap="100">
  <analyzer>
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.EnglishMinimalStemFilterFactory"/>
  </analyzer>
</fieldType>

EnglishMinimalStemmer

Minimal plural stemmer for English.

This stemmer implements the "S-Stemmer" from How Effective Is Suffixing? Donna Harman.

EnglishPossessiveFilter

TokenFilter that removes possessives (trailing 's) from words.

You must specify the required LuceneVersion compatibility when creating EnglishPossessiveFilter:

As of 3.6, U+2019 RIGHT SINGLE QUOTATION MARK and U+FF07 FULLWIDTH APOSTROPHE are also treated as quotation marks.

EnglishPossessiveFilterFactory

Factory for EnglishPossessiveFilter.

<fieldType name="text_enpossessive" class="solr.TextField" positionIncrementGap="100">
  <analyzer>
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.EnglishPossessiveFilterFactory"/>
  </analyzer>
</fieldType>

KStemFilter

A high-performance kstem filter for english.

See "Viewing Morphology as an Inference Process" (Krovetz, R., Proceedings of the Sixteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 191-203, 1993).

All terms must already be lowercased for this filter to work correctly.

Note: This filter is aware of the KeywordAttribute. To prevent certain terms from being passed to the stemmer IsKeyword should be set to

true

in a previous TokenStream. Note: For including the original term as well as the stemmed version, see KeywordRepeatFilterFactory

KStemFilterFactory

Factory for KStemFilter.

<fieldType name="text_kstem" class="solr.TextField" positionIncrementGap="100">
  <analyzer>
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.KStemFilterFactory"/>
  </analyzer>
</fieldType>

KStemmer

This class implements the Kstem algorithm

PorterStemFilter

Transforms the token stream as per the Porter stemming algorithm.

Note: the input to the stemming filter must already be in lower case, so you will need to use LowerCaseFilter or LowerCaseTokenizer farther down the Tokenizer chain in order for this to work properly!

To use this filter with other analyzers, you'll want to write an Analyzer class that sets up the TokenStream chain as you want it. To use this with LowerCaseTokenizer, for example, you'd write an analyzer like this:

class MyAnalyzer : Analyzer {
  protected override TokenStreamComponents CreateComponents(string fieldName, TextReader reader) {
    Tokenizer source = new LowerCaseTokenizer(version, reader);
    return new TokenStreamComponents(source, new PorterStemFilter(source));
  }
}

Note: This filter is aware of the KeywordAttribute. To prevent certain terms from being passed to the stemmer IsKeyword should be set to

true

in a previous TokenStream. Note: For including the original term as well as the stemmed version, see KeywordRepeatFilterFactory

PorterStemFilterFactory

Factory for PorterStemFilter.

<fieldType name="text_porterstem" class="solr.TextField" positionIncrementGap="100">
  <analyzer>
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.PorterStemFilterFactory"/>
  </analyzer>
</fieldType>