Namespace Lucene.Net.Analysis.Morfologik
This package provides dictionary-driven lemmatization ("accurate stemming") filter and analyzer for the Polish Language, driven by the Morfologik library developed by Dawid Weiss and Marcin Miłkowski.
For an introduction to Lucene's analysis API, see the Lucene.Net.Analysis namespace documentation.
The MorfologikFilter yields one or more terms for each token. Each of those terms is given the same position in the index.
Classes
MorfologikAnalyzer
Lucene.Net.Analysis.Analyzer using Morfologik library.
MorfologikFilter
Lucene.Net.Analysis.TokenFilter using Morfologik library to transform input tokens into lemma and morphosyntactic (POS) tokens. Applies to Polish only.
MorfologikFilter contains a MorphosyntacticTagsAttribute, which provides morphosyntactic annotations for produced lemmas. See the Morfologik documentation for details.
MorfologikFilterFactory
Filter factory for MorfologikFilter.
An explicit resource name of the dictionary (".dict"
) can be
provided via the
dictionary
attribute, as the example below demonstrates:
<fieldType name="text_mylang" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.MorfologikFilterFactory" dictionary="mylang.dict" />
</analyzer>
</fieldType>
If the dictionary attribute is not provided, the Polish dictionary is loaded and used by default.
See: Morfologik web site