Namespace Lucene.Net.Analysis.Ja
Analyzer for Japanese.
Classes
GraphvizFormatter
Outputs the dot (graphviz) string for the viterbi lattice.
JapaneseAnalyzer
Analyzer for Japanese that uses morphological analysis.
JapaneseBaseFormFilter
Replaces term text with the IBase
This acts as a lemmatizer for verbs and adjectives.
To prevent terms from being stemmed use an instance of
Set
JapaneseBaseFormFilterFactory
Factory for Japanese
<fieldType name="text_ja" class="solr.TextField">
<analyzer>
<tokenizer class="solr.JapaneseTokenizerFactory"/>
<filter class="solr.JapaneseBaseFormFilterFactory"/>
</analyzer>
</fieldType>
JapaneseIterationMarkCharFilter
Normalizes Japanese horizontal iteration marks (odoriji) to their expanded form.
JapaneseIterationMarkCharFilterFactory
Factory for Japanese
<fieldType name="text_ja" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="false">
<analyzer>
<charFilter class="solr.JapaneseIterationMarkCharFilterFactory normalizeKanji="true" normalizeKana="true"/>
<tokenizer class="solr.JapaneseTokenizerFactory"/>
</analyzer>
</fieldType>
JapaneseKatakanaStemFilter
A Token
JapaneseKatakanaStemFilterFactory
Factory for Japanese
<fieldType name="text_ja" class="solr.TextField">
<analyzer>
<tokenizer class="solr.JapaneseTokenizerFactory"/>
<filter class="solr.JapaneseKatakanaStemFilterFactory"
minimumLength="4"/>
</analyzer>
</fieldType>
JapanesePartOfSpeechStopFilter
Removes tokens that match a set of part-of-speech tags.
JapanesePartOfSpeechStopFilterFactory
Factory for Japanese
<fieldType name="text_ja" class="solr.TextField">
<analyzer>
<tokenizer class="solr.JapaneseTokenizerFactory"/>
<filter class="solr.JapanesePartOfSpeechStopFilterFactory"
tags="stopTags.txt"
enablePositionIncrements="true"/>
</analyzer>
</fieldType>
JapaneseReadingFormFilter
A Token
JapaneseReadingFormFilterFactory
Factory for Japanese
<fieldType name="text_ja" class="solr.TextField">
<analyzer>
<tokenizer class="solr.JapaneseTokenizerFactory"/>
<filter class="solr.JapaneseReadingFormFilterFactory"
useRomaji="false"/>
</analyzer>
</fieldType>
JapaneseTokenizer
Tokenizer for Japanese that uses morphological analysis.
JapaneseTokenizerFactory
Factory for Japanese
<fieldType name="text_ja" class="solr.TextField">
<analyzer>
<tokenizer class="solr.JapaneseTokenizerFactory"
mode="NORMAL"
userDictionary="user.txt"
userDictionaryEncoding="UTF-8"
discardPunctuation="true"
/>
<filter class="solr.JapaneseBaseFormFilterFactory"/>
</analyzer>
</fieldType>
Token
Analyzed token with morphological data from its dictionary.
Enums
JapaneseTokenizerMode
Tokenization mode: this determines how the tokenizer handles compound and unknown words.
JapaneseTokenizerType
Token type reflecting the original source of this token