Namespace Lucene.Net.Analysis.OpenNlp

OpenNLP Library Integration

Classes

OpenNLPChunkerFilter

Run OpenNLP chunker. Prerequisite: the OpenNLPTokenizer and OpenNLPPOSFilter must precede this filter. Tags terms in the TypeAttribute, replacing the POS tags previously put there by OpenNLPPOSFilter.

OpenNLPChunkerFilterFactory

Factory for OpenNLPChunkerFilter.

<fieldType name="text_opennlp_chunked" class="solr.TextField" positionIncrementGap="100">
  <analyzer>
    <tokenizer class="solr.OpenNLPTokenizerFactory" sentenceModel="filename" tokenizerModel="filename"/>
    <filter class="solr.OpenNLPPOSFilterFactory" posTaggerModel="filename"/>
    <filter class="solr.OpenNLPChunkerFilterFactory" chunkerModel="filename"/>
  </analyzer>
</fieldType>

OpenNLPLemmatizerFilter

Runs OpenNLP dictionary-based and/or MaxEnt lemmatizers.

Both a dictionary-based lemmatizer and a MaxEnt lemmatizer are supported, via the "dictionary" and "lemmatizerModel" params, respectively. If both are configured, the dictionary-based lemmatizer is tried first, and then the MaxEnt lemmatizer is consulted for out-of-vocabulary tokens.

The dictionary file must be encoded as UTF-8, with one entry per line, in the form word[tab]lemma[tab]part-of-speech

OpenNLPLemmatizerFilterFactory

Factory for OpenNLPLemmatizerFilter.

<fieldType name="text_opennlp_lemma" class="solr.TextField" positionIncrementGap="100"
  <analyzer>
    <tokenizer class="solr.OpenNLPTokenizerFactory"
               sentenceModel="filename"
               tokenizerModel="filename"/>
    />
    <filter class="solr.OpenNLPLemmatizerFilterFactory"
            dictionary="filename"
            lemmatizerModel="filename"/>
  </analyzer>
</fieldType>

OpenNLPPOSFilter

Run OpenNLP POS tagger. Tags all terms in the ITypeAttribute.

OpenNLPPOSFilterFactory

Factory for OpenNLPPOSFilter.

<fieldType name="text_opennlp_pos" class="solr.TextField" positionIncrementGap="100">
  <analyzer>
    <tokenizer class="solr.OpenNLPTokenizerFactory" sentenceModel="filename" tokenizerModel="filename"/>
    <filter class="solr.OpenNLPPOSFilterFactory" posTaggerModel="filename"/>
  </analyzer>
</fieldType>

OpenNLPSentenceBreakIterator

A that splits sentences using an OpenNLP sentence chunking model.

OpenNLPTokenizer

Run OpenNLP SentenceDetector and Lucene.Net.Analysis.Tokenizer. The last token in each sentence is marked by setting the EOS_FLAG_BIT in the IFlagsAttribute; following filters can use this information to apply operations to tokens one sentence at a time.

OpenNLPTokenizerFactory

Factory for OpenNLPTokenizer.

<fieldType name="text_opennlp" class="solr.TextField" positionIncrementGap="100"
  <analyzer>
    <tokenizer class="solr.OpenNLPTokenizerFactory" sentenceModel="filename" tokenizerModel="filename"/>
  </analyzer>
</fieldType>