Namespace Lucene.Net.Analysis.CommonGrams
Construct n-grams for frequently occurring terms and phrases.
Classes
CommonGramsFilter
Construct bigrams for frequently occurring terms while indexing. Single terms
are still indexed too, with bigrams overlaid. This is achieved through the
use of Position
- input:"the quick brown fox"
- output:|"the","the-quick"|"brown"|"fox"|
- "the-quick" has a position increment of 0 so it is in the same position as "the" "the-quick" has a term.type() of "gram"
CommonGramsFilterFactory
Constructs a Common
<fieldType name="text_cmmngrms" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.CommonGramsFilterFactory" words="commongramsstopwords.txt" ignoreCase="false"/>
</analyzer>
</fieldType>
CommonGramsQueryFilter
Wrap a Common
Example:
- query input to CommonGramsFilter: "the rain in spain falls mainly"
- output of CommomGramsFilter/input to CommonGramsQueryFilter: |"the, "the-rain"|"rain" "rain-in"|"in, "in-spain"|"spain"|"falls"|"mainly"
- output of CommonGramsQueryFilter:"the-rain", "rain-in" ,"in-spain", "falls", "mainly"
CommonGramsQueryFilterFactory
Construct Common
<fieldType name="text_cmmngrmsqry" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.CommonGramsQueryFilterFactory" words="commongramsquerystopwords.txt" ignoreCase="false"/>
</analyzer>
</fieldType>