Show / Hide Table of Contents

    Namespace Lucene.Net.Analysis.CommonGrams

    Construct n-grams for frequently occurring terms and phrases.

    Classes

    CommonGramsFilter

    Construct bigrams for frequently occurring terms while indexing. Single terms are still indexed too, with bigrams overlaid. This is achieved through the use of PositionIncrement. Bigrams have a type of GRAM_TYPE Example:

    • input:"the quick brown fox"
    • output:|"the","the-quick"|"brown"|"fox"|
    • "the-quick" has a position increment of 0 so it is in the same position as "the" "the-quick" has a term.type() of "gram"

    CommonGramsFilterFactory

    Constructs a CommonGramsFilter.

    <fieldType name="text_cmmngrms" class="solr.TextField" positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.CommonGramsFilterFactory" words="commongramsstopwords.txt" ignoreCase="false"/>
      </analyzer>
    </fieldType>

    CommonGramsQueryFilter

    Wrap a CommonGramsFilter optimizing phrase queries by only returning single words when they are not a member of a bigram.

    Example:

    • query input to CommonGramsFilter: "the rain in spain falls mainly"
    • output of CommomGramsFilter/input to CommonGramsQueryFilter: |"the, "the-rain"|"rain" "rain-in"|"in, "in-spain"|"spain"|"falls"|"mainly"
    • output of CommonGramsQueryFilter:"the-rain", "rain-in" ,"in-spain", "falls", "mainly"

    CommonGramsQueryFilterFactory

    Construct CommonGramsQueryFilter.

    <fieldType name="text_cmmngrmsqry" class="solr.TextField" positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.CommonGramsQueryFilterFactory" words="commongramsquerystopwords.txt" ignoreCase="false"/>
      </analyzer>
    </fieldType>
    • Improve this Doc
    Back to top Copyright © 2020 Licensed to the Apache Software Foundation (ASF)