Fork me on GitHub
  • API

    Show / Hide Table of Contents

    Namespace Lucene.Net.Analysis.OpenNlp

    OpenNLP Library Integration

    Classes

    OpenNLPChunkerFilter

    Run OpenNLP chunker. Prerequisite: the OpenNLPTokenizer and OpenNLPPOSFilter must precede this filter. Tags terms in the TypeAttribute, replacing the POS tags previously put there by OpenNLPPOSFilter.

    OpenNLPChunkerFilterFactory

    Factory for OpenNLPChunkerFilter.

    <fieldType name="text_opennlp_chunked" class="solr.TextField" positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.OpenNLPTokenizerFactory" sentenceModel="filename" tokenizerModel="filename"/>
        <filter class="solr.OpenNLPPOSFilterFactory" posTaggerModel="filename"/>
        <filter class="solr.OpenNLPChunkerFilterFactory" chunkerModel="filename"/>
      </analyzer>
    </fieldType>

    OpenNLPLemmatizerFilter

    Runs OpenNLP dictionary-based and/or MaxEnt lemmatizers.

    Both a dictionary-based lemmatizer and a MaxEnt lemmatizer are supported, via the "dictionary" and "lemmatizerModel" params, respectively. If both are configured, the dictionary-based lemmatizer is tried first, and then the MaxEnt lemmatizer is consulted for out-of-vocabulary tokens.

    The dictionary file must be encoded as UTF-8, with one entry per line, in the form word[tab]lemma[tab]part-of-speech

    OpenNLPLemmatizerFilterFactory

    Factory for OpenNLPLemmatizerFilter.

    <fieldType name="text_opennlp_lemma" class="solr.TextField" positionIncrementGap="100"
      <analyzer>
        <tokenizer class="solr.OpenNLPTokenizerFactory"
                   sentenceModel="filename"
                   tokenizerModel="filename"/>
        />
        <filter class="solr.OpenNLPLemmatizerFilterFactory"
                dictionary="filename"
                lemmatizerModel="filename"/>
      </analyzer>
    </fieldType>

    OpenNLPPOSFilter

    Run OpenNLP POS tagger. Tags all terms in the ITypeAttribute.

    OpenNLPPOSFilterFactory

    Factory for OpenNLPPOSFilter.

    <fieldType name="text_opennlp_pos" class="solr.TextField" positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.OpenNLPTokenizerFactory" sentenceModel="filename" tokenizerModel="filename"/>
        <filter class="solr.OpenNLPPOSFilterFactory" posTaggerModel="filename"/>
      </analyzer>
    </fieldType>

    OpenNLPSentenceBreakIterator

    A that splits sentences using an OpenNLP sentence chunking model.

    OpenNLPTokenizer

    Run OpenNLP SentenceDetector and Lucene.Net.Analysis.Tokenizer. The last token in each sentence is marked by setting the EOS_FLAG_BIT in the IFlagsAttribute; following filters can use this information to apply operations to tokens one sentence at a time.

    OpenNLPTokenizerFactory

    Factory for OpenNLPTokenizer.

    <fieldType name="text_opennlp" class="solr.TextField" positionIncrementGap="100"
      <analyzer>
        <tokenizer class="solr.OpenNLPTokenizerFactory" sentenceModel="filename" tokenizerModel="filename"/>
      </analyzer>
    </fieldType>
    • Improve this Doc
    Back to top Copyright © 2020 The Apache Software Foundation, Licensed under the Apache License, Version 2.0
    Apache Lucene.Net, Lucene.Net, Apache, the Apache feather logo, and the Apache Lucene.Net project logo are trademarks of The Apache Software Foundation.
    All other marks mentioned may be trademarks or registered trademarks of their respective owners.