Class DaitchMokotoffSoundex
Encodes a string into a Daitch-Mokotoff Soundex value.
Implements
Inherited Members
Namespace: Lucene.Net.Analysis.Phonetic.Language
Assembly: Lucene.Net.Analysis.Phonetic.dll
Syntax
public class DaitchMokotoffSoundex : IStringEncoder
Remarks
The Daitch-Mokotoff Soundex algorithm is a refinement of the Russel and American Soundex algorithms, yielding greater accuracy in matching especially Slavish and Yiddish surnames with similar pronunciation but differences in spelling.
The main differences compared to the other soundex variants are:- coded names are 6 digits long
- the initial character of the name is coded
- rules to encoded multi-character n-grams
- multiple possible encodings for the same name (branching)
- Encode(string)branching disabled, only the first code will be returned
- GetSoundex(string)branching enabled, all codes will be returned, separated by '|'
Lucene.Net.Analysis.Phonetic.Language.dmrules.txt
.
This class is thread-safe.
See: Wikipedia - Daitch-Mokotoff Soundex
See: Avotaynu - Soundexing and Genealogy
since 1.10
Constructors
DaitchMokotoffSoundex()
Creates a new instance with ASCII-folding enabled.
Declaration
public DaitchMokotoffSoundex()
Remarks
The Daitch-Mokotoff Soundex algorithm is a refinement of the Russel and American Soundex algorithms, yielding greater accuracy in matching especially Slavish and Yiddish surnames with similar pronunciation but differences in spelling.
The main differences compared to the other soundex variants are:- coded names are 6 digits long
- the initial character of the name is coded
- rules to encoded multi-character n-grams
- multiple possible encodings for the same name (branching)
- Encode(string)branching disabled, only the first code will be returned
- GetSoundex(string)branching enabled, all codes will be returned, separated by '|'
Lucene.Net.Analysis.Phonetic.Language.dmrules.txt
.
This class is thread-safe.
See: Wikipedia - Daitch-Mokotoff Soundex
See: Avotaynu - Soundexing and Genealogy
since 1.10
See Also
DaitchMokotoffSoundex(bool)
Creates a new instance.
With ASCII-folding enabled, certain accented characters will be transformed to equivalent ASCII characters, e.g. è -> e.Declaration
public DaitchMokotoffSoundex(bool folding)
Parameters
Type | Name | Description |
---|---|---|
bool | folding | If ASCII-folding shall be performed before encoding. |
Remarks
The Daitch-Mokotoff Soundex algorithm is a refinement of the Russel and American Soundex algorithms, yielding greater accuracy in matching especially Slavish and Yiddish surnames with similar pronunciation but differences in spelling.
The main differences compared to the other soundex variants are:- coded names are 6 digits long
- the initial character of the name is coded
- rules to encoded multi-character n-grams
- multiple possible encodings for the same name (branching)
- Encode(string)branching disabled, only the first code will be returned
- GetSoundex(string)branching enabled, all codes will be returned, separated by '|'
Lucene.Net.Analysis.Phonetic.Language.dmrules.txt
.
This class is thread-safe.
See: Wikipedia - Daitch-Mokotoff Soundex
See: Avotaynu - Soundexing and Genealogy
since 1.10
See Also
Methods
Encode(string)
Encodes a string using the Daitch-Mokotoff soundex algorithm without branching.
Declaration
public virtual string Encode(string source)
Parameters
Type | Name | Description |
---|---|---|
string | source | A string to encode. |
Returns
Type | Description |
---|---|
string | A DM Soundex code corresponding to the string supplied. |
Remarks
The Daitch-Mokotoff Soundex algorithm is a refinement of the Russel and American Soundex algorithms, yielding greater accuracy in matching especially Slavish and Yiddish surnames with similar pronunciation but differences in spelling.
The main differences compared to the other soundex variants are:- coded names are 6 digits long
- the initial character of the name is coded
- rules to encoded multi-character n-grams
- multiple possible encodings for the same name (branching)
- Encode(string)branching disabled, only the first code will be returned
- GetSoundex(string)branching enabled, all codes will be returned, separated by '|'
Lucene.Net.Analysis.Phonetic.Language.dmrules.txt
.
This class is thread-safe.
See: Wikipedia - Daitch-Mokotoff Soundex
See: Avotaynu - Soundexing and Genealogy
since 1.10
Exceptions
Type | Condition |
---|---|
ArgumentException | If a character is not mapped. |
See Also
GetSoundex(string)
Encodes a string using the Daitch-Mokotoff soundex algorithm with branching.
In case a string is encoded into multiple codes (see branching rules), the result will contain all codes, separated by '|'. Example: the name "AUERBACH" is encoded as both- 097400
- 097500
Declaration
public virtual string GetSoundex(string source)
Parameters
Type | Name | Description |
---|---|---|
string | source | A string to encode. |
Returns
Type | Description |
---|---|
string | A string containing a set of DM Soundex codes corresponding to the string supplied. |
Remarks
The Daitch-Mokotoff Soundex algorithm is a refinement of the Russel and American Soundex algorithms, yielding greater accuracy in matching especially Slavish and Yiddish surnames with similar pronunciation but differences in spelling.
The main differences compared to the other soundex variants are:- coded names are 6 digits long
- the initial character of the name is coded
- rules to encoded multi-character n-grams
- multiple possible encodings for the same name (branching)
- Encode(string)branching disabled, only the first code will be returned
- GetSoundex(string)branching enabled, all codes will be returned, separated by '|'
Lucene.Net.Analysis.Phonetic.Language.dmrules.txt
.
This class is thread-safe.
See: Wikipedia - Daitch-Mokotoff Soundex
See: Avotaynu - Soundexing and Genealogy
since 1.10
Exceptions
Type | Condition |
---|---|
ArgumentException | If a character is not mapped. |