Fork me on GitHub
  • API

    Show / Hide Table of Contents

    Namespace Lucene.Net.Analysis.CommonGrams

    Construct n-grams for frequently occurring terms and phrases.

    Classes

    CommonGramsFilter

    Construct bigrams for frequently occurring terms while indexing. Single terms are still indexed too, with bigrams overlaid. This is achieved through the use of PositionIncrement. Bigrams have a type of GRAM_TYPE Example:

    • input:"the quick brown fox"
    • output:|"the","the-quick"|"brown"|"fox"|
    • "the-quick" has a position increment of 0 so it is in the same position as "the" "the-quick" has a term.type() of "gram"

    CommonGramsFilterFactory

    Constructs a CommonGramsFilter.

    <fieldType name="text_cmmngrms" class="solr.TextField" positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.CommonGramsFilterFactory" words="commongramsstopwords.txt" ignoreCase="false"/>
      </analyzer>
    </fieldType>

    CommonGramsQueryFilter

    Wrap a CommonGramsFilter optimizing phrase queries by only returning single words when they are not a member of a bigram.

    Example:

    • query input to CommonGramsFilter: "the rain in spain falls mainly"
    • output of CommomGramsFilter/input to CommonGramsQueryFilter: |"the, "the-rain"|"rain" "rain-in"|"in, "in-spain"|"spain"|"falls"|"mainly"
    • output of CommonGramsQueryFilter:"the-rain", "rain-in" ,"in-spain", "falls", "mainly"

    CommonGramsQueryFilterFactory

    Construct CommonGramsQueryFilter.

    <fieldType name="text_cmmngrmsqry" class="solr.TextField" positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.CommonGramsQueryFilterFactory" words="commongramsquerystopwords.txt" ignoreCase="false"/>
      </analyzer>
    </fieldType>
    • Improve this Doc
    Back to top Copyright © 2020 The Apache Software Foundation, Licensed under the Apache License, Version 2.0
    Apache Lucene.Net, Lucene.Net, Apache, the Apache feather logo, and the Apache Lucene.Net project logo are trademarks of The Apache Software Foundation.
    All other marks mentioned may be trademarks or registered trademarks of their respective owners.