Namespace Lucene.Net.Analysis.CommonGrams
Construct n-grams for frequently occurring terms and phrases.
Classes
CommonGramsFilter
Construct bigrams for frequently occurring terms while indexing. Single terms are still indexed too, with bigrams overlaid. This is achieved through the use of Lucene.Net.Analysis.TokenAttributes.IPositionIncrementAttribute.PositionIncrement. Bigrams have a type of GRAM_TYPE Example:
- input:"the quick brown fox"
- output:|"the","the-quick"|"brown"|"fox"|
- "the-quick" has a position increment of 0 so it is in the same position as "the" "the-quick" has a term.type() of "gram"
CommonGramsFilterFactory
Constructs a CommonGramsFilter.
<fieldType name="text_cmmngrms" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.CommonGramsFilterFactory" words="commongramsstopwords.txt" ignoreCase="false"/>
</analyzer>
</fieldType>
CommonGramsQueryFilter
Wrap a CommonGramsFilter optimizing phrase queries by only returning single words when they are not a member of a bigram.
Example:- query input to CommonGramsFilter: "the rain in spain falls mainly"
- output of CommomGramsFilter/input to CommonGramsQueryFilter: |"the, "the-rain"|"rain" "rain-in"|"in, "in-spain"|"spain"|"falls"|"mainly"
- output of CommonGramsQueryFilter:"the-rain", "rain-in" ,"in-spain", "falls", "mainly"
CommonGramsQueryFilterFactory
Construct CommonGramsQueryFilter.
<fieldType name="text_cmmngrmsqry" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.CommonGramsQueryFilterFactory" words="commongramsquerystopwords.txt" ignoreCase="false"/>
</analyzer>
</fieldType>