• API

    Show / Hide Table of Contents

    Namespace Lucene.Net.Analysis.Morfologik

    This package provides dictionary-driven lemmatization ("accurate stemming") filter and analyzer for the Polish Language, driven by the Morfologik library developed by Dawid Weiss and Marcin Miłkowski.

    For an introduction to Lucene's analysis API, see the <xref:Lucene.Net.Analysis> package documentation.

    The MorfologikFilter yields one or more terms for each token. Each of those terms is given the same position in the index.

    Classes

    MorfologikAnalyzer

    Lucene.Net.Analysis.Analyzer using Morfologik library.

    See: Morfologik project page

    MorfologikFilter

    Lucene.Net.Analysis.TokenFilter using Morfologik library to transform input tokens into lemma and morphosyntactic (POS) tokens. Applies to Polish only.

    MorfologikFilter contains a MorphosyntacticTagsAttribute, which provides morphosyntactic annotations for produced lemmas. See the Morfologik documentation for details.

    MorfologikFilterFactory

    Filter factory for MorfologikFilter.

    An explicit resource name of the dictionary (".dict") can be provided via the

    dictionary
    attribute, as the example below demonstrates:

    <fieldType name="text_mylang" class="solr.TextField" positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.MorfologikFilterFactory" dictionary="mylang.dict" />
      </analyzer>
    </fieldType>

    If the dictionary attribute is not provided, the Polish dictionary is loaded and used by default.

    See: Morfologik web site

    • Improve this Doc
    Back to top Copyright © 2020 Licensed to the Apache Software Foundation (ASF)