Namespace Lucene.Net.Analysis.OpenNlp
OpenNLP Library Integration
Classes
OpenNLPChunkerFilter
Run OpenNLP chunker. Prerequisite: the OpenNLPTokenizer and OpenNLPPOSFilter must precede this filter. Tags terms in the TypeAttribute, replacing the POS tags previously put there by OpenNLPPOSFilter.
OpenNLPChunkerFilterFactory
Factory for OpenNLPChunkerFilter.
<fieldType name="text_opennlp_chunked" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.OpenNLPTokenizerFactory" sentenceModel="filename" tokenizerModel="filename"/>
<filter class="solr.OpenNLPPOSFilterFactory" posTaggerModel="filename"/>
<filter class="solr.OpenNLPChunkerFilterFactory" chunkerModel="filename"/>
</analyzer>
</fieldType>
OpenNLPLemmatizerFilter
Runs OpenNLP dictionary-based and/or MaxEnt lemmatizers.
Both a dictionary-based lemmatizer and a MaxEnt lemmatizer are supported, via the "dictionary" and "lemmatizerModel" params, respectively. If both are configured, the dictionary-based lemmatizer is tried first, and then the MaxEnt lemmatizer is consulted for out-of-vocabulary tokens.
The dictionary file must be encoded as UTF-8, with one entry per line,
in the form word[tab]lemma[tab]part-of-speech
OpenNLPLemmatizerFilterFactory
Factory for OpenNLPLemmatizerFilter.
<fieldType name="text_opennlp_lemma" class="solr.TextField" positionIncrementGap="100"
<analyzer>
<tokenizer class="solr.OpenNLPTokenizerFactory"
sentenceModel="filename"
tokenizerModel="filename"/>
/>
<filter class="solr.OpenNLPLemmatizerFilterFactory"
dictionary="filename"
lemmatizerModel="filename"/>
</analyzer>
</fieldType>
OpenNLPPOSFilter
Run OpenNLP POS tagger. Tags all terms in the ITypeAttribute.
OpenNLPPOSFilterFactory
Factory for OpenNLPPOSFilter.
<fieldType name="text_opennlp_pos" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.OpenNLPTokenizerFactory" sentenceModel="filename" tokenizerModel="filename"/>
<filter class="solr.OpenNLPPOSFilterFactory" posTaggerModel="filename"/>
</analyzer>
</fieldType>
OpenNLPSentenceBreakIterator
A
OpenNLPTokenizer
Run OpenNLP SentenceDetector and Lucene.Net.Analysis.Tokenizer. The last token in each sentence is marked by setting the EOS_FLAG_BIT in the IFlagsAttribute; following filters can use this information to apply operations to tokens one sentence at a time.
OpenNLPTokenizerFactory
Factory for OpenNLPTokenizer.
<fieldType name="text_opennlp" class="solr.TextField" positionIncrementGap="100"
<analyzer>
<tokenizer class="solr.OpenNLPTokenizerFactory" sentenceModel="filename" tokenizerModel="filename"/>
</analyzer>
</fieldType>