• API

    Show / Hide Table of Contents

    Namespace Lucene.Net.Analysis.OpenNlp

    OpenNLP Library Integration

    Classes

    OpenNLPChunkerFilter

    Run OpenNLP chunker. Prerequisite: the OpenNLPTokenizer and OpenNLPPOSFilter must precede this filter. Tags terms in the TypeAttribute, replacing the POS tags previously put there by OpenNLPPOSFilter.

    OpenNLPChunkerFilterFactory

    Factory for OpenNLPChunkerFilter.

    <fieldType name="text_opennlp_chunked" class="solr.TextField" positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.OpenNLPTokenizerFactory" sentenceModel="filename" tokenizerModel="filename"/>
        <filter class="solr.OpenNLPPOSFilterFactory" posTaggerModel="filename"/>
        <filter class="solr.OpenNLPChunkerFilterFactory" chunkerModel="filename"/>
      </analyzer>
    </fieldType>

    OpenNLPLemmatizerFilter

    Runs OpenNLP dictionary-based and/or MaxEnt lemmatizers.

    Both a dictionary-based lemmatizer and a MaxEnt lemmatizer are supported, via the "dictionary" and "lemmatizerModel" params, respectively. If both are configured, the dictionary-based lemmatizer is tried first, and then the MaxEnt lemmatizer is consulted for out-of-vocabulary tokens.

    The dictionary file must be encoded as UTF-8, with one entry per line, in the form word[tab]lemma[tab]part-of-speech

    OpenNLPLemmatizerFilterFactory

    Factory for OpenNLPLemmatizerFilter.

    <fieldType name="text_opennlp_lemma" class="solr.TextField" positionIncrementGap="100"
      <analyzer>
        <tokenizer class="solr.OpenNLPTokenizerFactory"
                   sentenceModel="filename"
                   tokenizerModel="filename"/>
        />
        <filter class="solr.OpenNLPLemmatizerFilterFactory"
                dictionary="filename"
                lemmatizerModel="filename"/>
      </analyzer>
    </fieldType>

    OpenNLPPOSFilter

    Run OpenNLP POS tagger. Tags all terms in the ITypeAttribute.

    OpenNLPPOSFilterFactory

    Factory for OpenNLPPOSFilter.

    <fieldType name="text_opennlp_pos" class="solr.TextField" positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.OpenNLPTokenizerFactory" sentenceModel="filename" tokenizerModel="filename"/>
        <filter class="solr.OpenNLPPOSFilterFactory" posTaggerModel="filename"/>
      </analyzer>
    </fieldType>

    OpenNLPSentenceBreakIterator

    A that splits sentences using an OpenNLP sentence chunking model.

    OpenNLPTokenizer

    Run OpenNLP SentenceDetector and Lucene.Net.Analysis.Tokenizer. The last token in each sentence is marked by setting the EOS_FLAG_BIT in the IFlagsAttribute; following filters can use this information to apply operations to tokens one sentence at a time.

    OpenNLPTokenizerFactory

    Factory for OpenNLPTokenizer.

    <fieldType name="text_opennlp" class="solr.TextField" positionIncrementGap="100"
      <analyzer>
        <tokenizer class="solr.OpenNLPTokenizerFactory" sentenceModel="filename" tokenizerModel="filename"/>
      </analyzer>
    </fieldType>
    • Improve this Doc
    Back to top Copyright © 2020 Licensed to the Apache Software Foundation (ASF)