Fork me on GitHub
  • API

    Show / Hide Table of Contents

    Namespace Lucene.Net.Analysis.Phonetic

    Analysis for indexing phonetic signatures (for sounds-alike search)

    For an introduction to Lucene's analysis API, see the Lucene.Net.Analysis namespace documentation.

    This module provides analysis components (using encoders ported to .NET from Apache Commons Codec) that index and search phonetic signatures.

    Classes

    BeiderMorseFilter

    TokenFilter for Beider-Morse phonetic encoding.

    Note

    This API is experimental and might change in incompatible ways in the next release.

    BeiderMorseFilterFactory

    Factory for BeiderMorseFilter.

    <fieldType name="text_bm" class="solr.TextField" positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.BeiderMorseFilterFactory"
           nameType="GENERIC" ruleType="APPROX" 
           concat="true" languageSet="auto"
        </filter>
      </analyzer>
    </fieldType>

    DoubleMetaphoneFilter

    Filter for DoubleMetaphone (supporting secondary codes)

    DoubleMetaphoneFilterFactory

    Factory for DoubleMetaphoneFilter.

    <fieldType name="text_dblmtphn" class="solr.TextField" positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.DoubleMetaphoneFilterFactory" inject="true" maxCodeLength="4"/>
      </analyzer>
    </fieldType>

    PhoneticFilter

    Create tokens for phonetic matches. See the Language namespace.

    PhoneticFilterFactory

    Factory for PhoneticFilter.

    Create tokens based on phonetic encoders from the Language namespace.

    This takes one required argument, "encoder", and the rest are optional:
    • encoder required, one of "DoubleMetaphone", "Metaphone", "Soundex", "RefinedSoundex", "Caverphone" (v2.0), or "ColognePhonetic" (case insensitive). If encoder isn't one of these, it'll be resolved as a class name either by itself if it already contains a '.' or otherwise as in the same package as these others.
    • inject (default=true) add tokens to the stream with the offset=0
    • maxCodeLength The maximum length of the phonetic codes, as defined by the encoder. If an encoder doesn't support this then specifying this is an error.
    <fieldType name="text_phonetic" class="solr.TextField" positionIncrementGap="100">
                        <analyzer>
                          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
                          <filter class="solr.PhoneticFilterFactory" encoder="DoubleMetaphone" inject="true"/>
                        </analyzer>
                      </fieldType>
    Back to top Copyright © 2024 The Apache Software Foundation, Licensed under the Apache License, Version 2.0
    Apache Lucene.Net, Lucene.Net, Apache, the Apache feather logo, and the Apache Lucene.Net project logo are trademarks of The Apache Software Foundation.
    All other marks mentioned may be trademarks or registered trademarks of their respective owners.