Class ColognePhonetic
Encodes a string into a Cologne Phonetic value.
Implements
Inherited Members
Namespace: Lucene.Net.Analysis.Phonetic.Language
Assembly: Lucene.Net.Analysis.Phonetic.dll
Syntax
public class ColognePhonetic : IStringEncoder
Remarks
Implements the KÖlner Phonetik (Cologne Phonetic) algorithm issued by Hans Joachim Postel in 1969.
The KÖlner Phonetik is a phonetic algorithm which is optimized for the German language. It is related to the well-known soundex algorithm.Algorithm
- Step 1:
After preprocessing (conversion to upper case, transcription of germanic umlauts, removal of non alphabetical characters) the
letters of the supplied text are replaced by their phonetic code according to the following table.
Letter Context Code A, E, I, J, O, U, Y 0 H - B 1 P not before H 1 D, T not before C, S, Z 2 F, V, W 3 P before H 3 G, K, Q 4 C t onset before A, H, K, L, O, Q, R, U, X OR
before A, H, K, O, Q, U, X except after S, Z4 X not after C, K, Q 48 L 5 M, N 6 R 7 S, Z 8 C after S, Z OR
at onset except before A, H, K, L, O, Q, R, U, XOR
not before A, H, K, O, Q, U, X8 D, T before C, S, Z 8 X after C, K, Q 8 (Source: Wikipedia (de): KÖlner Phonetik -- Buchstabencodes)
Example:
"MÜller-LÜdenscheidt" => "MULLERLUDENSCHEIDT" => "6005507500206880022"
- Step 2:
Collapse of all multiple consecutive code digits.
Example:
"6005507500206880022" => "6050750206802"
- Step 3:
Removal of all codes "0" except at the beginning. This means that two or more identical consecutive digits can occur
if they occur after removing the "0" digits.
Example:
"6050750206802" => "65752682"
Methods
Encode(string)
Declaration
public virtual string Encode(string text)
Parameters
Type | Name | Description |
---|---|---|
string | text |
Returns
Type | Description |
---|---|
string | the encoded string |
Remarks
Implements the KÖlner Phonetik (Cologne Phonetic) algorithm issued by Hans Joachim Postel in 1969.
The KÖlner Phonetik is a phonetic algorithm which is optimized for the German language. It is related to the well-known soundex algorithm.Algorithm
- Step 1:
After preprocessing (conversion to upper case, transcription of germanic umlauts, removal of non alphabetical characters) the
letters of the supplied text are replaced by their phonetic code according to the following table.
Letter Context Code A, E, I, J, O, U, Y 0 H - B 1 P not before H 1 D, T not before C, S, Z 2 F, V, W 3 P before H 3 G, K, Q 4 C t onset before A, H, K, L, O, Q, R, U, X OR
before A, H, K, O, Q, U, X except after S, Z4 X not after C, K, Q 48 L 5 M, N 6 R 7 S, Z 8 C after S, Z OR
at onset except before A, H, K, L, O, Q, R, U, XOR
not before A, H, K, O, Q, U, X8 D, T before C, S, Z 8 X after C, K, Q 8 (Source: Wikipedia (de): KÖlner Phonetik -- Buchstabencodes)
Example:
"MÜller-LÜdenscheidt" => "MULLERLUDENSCHEIDT" => "6005507500206880022"
- Step 2:
Collapse of all multiple consecutive code digits.
Example:
"6005507500206880022" => "6050750206802"
- Step 3:
Removal of all codes "0" except at the beginning. This means that two or more identical consecutive digits can occur
if they occur after removing the "0" digits.
Example:
"6050750206802" => "65752682"
GetColognePhonetic(string)
Implements the Kölner Phonetik algorithm.
In contrast to the initial description of the algorithm, this implementation does the encoding in one pass.
Declaration
public virtual string GetColognePhonetic(string text)
Parameters
Type | Name | Description |
---|---|---|
string | text |
Returns
Type | Description |
---|---|
string | The corresponding encoding according to the Kölner Phonetik algorithm |
Remarks
Implements the KÖlner Phonetik (Cologne Phonetic) algorithm issued by Hans Joachim Postel in 1969.
The KÖlner Phonetik is a phonetic algorithm which is optimized for the German language. It is related to the well-known soundex algorithm.Algorithm
- Step 1:
After preprocessing (conversion to upper case, transcription of germanic umlauts, removal of non alphabetical characters) the
letters of the supplied text are replaced by their phonetic code according to the following table.
Letter Context Code A, E, I, J, O, U, Y 0 H - B 1 P not before H 1 D, T not before C, S, Z 2 F, V, W 3 P before H 3 G, K, Q 4 C t onset before A, H, K, L, O, Q, R, U, X OR
before A, H, K, O, Q, U, X except after S, Z4 X not after C, K, Q 48 L 5 M, N 6 R 7 S, Z 8 C after S, Z OR
at onset except before A, H, K, L, O, Q, R, U, XOR
not before A, H, K, O, Q, U, X8 D, T before C, S, Z 8 X after C, K, Q 8 (Source: Wikipedia (de): KÖlner Phonetik -- Buchstabencodes)
Example:
"MÜller-LÜdenscheidt" => "MULLERLUDENSCHEIDT" => "6005507500206880022"
- Step 2:
Collapse of all multiple consecutive code digits.
Example:
"6005507500206880022" => "6050750206802"
- Step 3:
Removal of all codes "0" except at the beginning. This means that two or more identical consecutive digits can occur
if they occur after removing the "0" digits.
Example:
"6050750206802" => "65752682"
IsEncodeEqual(string, string)
Encodes a string into a Cologne Phonetic value.
Declaration
public virtual bool IsEncodeEqual(string text1, string text2)
Parameters
Type | Name | Description |
---|---|---|
string | text1 | |
string | text2 |
Returns
Type | Description |
---|---|
bool |
Remarks
Implements the KÖlner Phonetik (Cologne Phonetic) algorithm issued by Hans Joachim Postel in 1969.
The KÖlner Phonetik is a phonetic algorithm which is optimized for the German language. It is related to the well-known soundex algorithm.Algorithm
- Step 1:
After preprocessing (conversion to upper case, transcription of germanic umlauts, removal of non alphabetical characters) the
letters of the supplied text are replaced by their phonetic code according to the following table.
Letter Context Code A, E, I, J, O, U, Y 0 H - B 1 P not before H 1 D, T not before C, S, Z 2 F, V, W 3 P before H 3 G, K, Q 4 C t onset before A, H, K, L, O, Q, R, U, X OR
before A, H, K, O, Q, U, X except after S, Z4 X not after C, K, Q 48 L 5 M, N 6 R 7 S, Z 8 C after S, Z OR
at onset except before A, H, K, L, O, Q, R, U, XOR
not before A, H, K, O, Q, U, X8 D, T before C, S, Z 8 X after C, K, Q 8 (Source: Wikipedia (de): KÖlner Phonetik -- Buchstabencodes)
Example:
"MÜller-LÜdenscheidt" => "MULLERLUDENSCHEIDT" => "6005507500206880022"
- Step 2:
Collapse of all multiple consecutive code digits.
Example:
"6005507500206880022" => "6050750206802"
- Step 3:
Removal of all codes "0" except at the beginning. This means that two or more identical consecutive digits can occur
if they occur after removing the "0" digits.
Example:
"6050750206802" => "65752682"