Namespace Lucene.Net.Analysis.De

Analyzer for German.

Classes

GermanAnalyzer

Lucene.Net.Analysis.Analyzer for German language.

Supports an external list of stopwords (words that will not be indexed at all) and an external list of exclusions (word that will not be stemmed, but indexed). A default set of stopwords is used unless an alternative list is specified, but the exclusion list is empty by default.

You must specify the required Lucene.Net.Util.LuceneVersion compatibility when creating GermanAnalyzer:

NOTE: This class uses the same Lucene.Net.Util.LuceneVersion dependent settings as StandardAnalyzer.

GermanLightStemFilter

A Lucene.Net.Analysis.TokenFilter that applies GermanLightStemmer to stem German words.

To prevent terms from being stemmed use an instance of SetKeywordMarkerFilter or a custom Lucene.Net.Analysis.TokenFilter that sets the Lucene.Net.Analysis.TokenAttributes.KeywordAttribute before this Lucene.Net.Analysis.TokenStream.

GermanLightStemFilterFactory

Factory for GermanLightStemFilter.

<fieldType name="text_delgtstem" class="solr.TextField" positionIncrementGap="100">
  <analyzer>
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.GermanLightStemFilterFactory"/>
  </analyzer>
</fieldType>

GermanLightStemmer

Light Stemmer for German.

This stemmer implements the "UniNE" algorithm in: Light Stemming Approaches for the French, Portuguese, German and Hungarian Languages Jacques Savoy

GermanMinimalStemFilter

A Lucene.Net.Analysis.TokenFilter that applies GermanMinimalStemmer to stem German words.

GermanMinimalStemFilterFactory

Factory for GermanMinimalStemFilter.

<fieldType name="text_deminstem" class="solr.TextField" positionIncrementGap="100">
  <analyzer>
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.GermanMinimalStemFilterFactory"/>
  </analyzer>
</fieldType>

GermanMinimalStemmer

Minimal Stemmer for German.

This stemmer implements the following algorithm: Morphologie et recherche d'information Jacques Savoy.

GermanNormalizationFilter

Normalizes German characters according to the heuristics of the http://snowball.tartarus.org/algorithms/german2/stemmer.html German2 snowball algorithm. It allows for the fact that ä, ö and ü are sometimes written as ae, oe and ue.

This is useful if you want this normalization without using the German2 stemmer, or perhaps no stemming at all.

GermanNormalizationFilterFactory

Factory for GermanNormalizationFilter.

<fieldType name="text_denorm" class="solr.TextField" positionIncrementGap="100">
  <analyzer>
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.GermanNormalizationFilterFactory"/>
  </analyzer>
</fieldType>

GermanStemFilter

A Lucene.Net.Analysis.TokenFilter that stems German words.

It supports a table of words that should not be stemmed at all. The stemmer used can be changed at runtime after the filter object is created (as long as it is a GermanStemmer).

GermanStemFilterFactory

Factory for GermanStemFilter.

<fieldType name="text_destem" class="solr.TextField" positionIncrementGap="100">
  <analyzer>
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.GermanStemFilterFactory"/>
  </analyzer>
</fieldType>

GermanStemmer

A stemmer for German words.

The algorithm is based on the report "A Fast and Simple Stemming Algorithm for German Words" by Jörg Caumanns (joerg.caumanns at isst.fhg.de).