Soundex - What, How, and Why
In genealogy research you will run into the word SOUNDEX. What is it? Why Does it exist?
Many family history records are indexed, mercifully, according to an ingenious phonetic index named "Soundex", and not alphabetically. Why? Because many people changed the spelling of their family names over time or when they moved. Easy examples are SCHMIDT and MULLER, which became SMITH and MILLER when they reached the United States. Others are REID, REED, and READ, and other similar combinations. The Soundex doesn't care how they are spelled; it only cares how they sound.
How does it work? The index doesn't include any vowels because, phonetically, they can't be trusted. That eliminates A, E, I, O, U, and Y and W and H. These last three are often silent or act as vowels. Next, it begins at the first of the alphabet and works its way along, grouping the remaining letters into phonetic groups.
GROUP 1: Let's see; "A" comes first, but we threw it out, so the first letter is "B". All the letters that pop like B are in Group 1. But a B and a V sound the same in some Spanish dialects, and a V sounds like an F in German, so let's includes them. The first group is the "pop and fizz" group, and it includes B, P, F, and V.
GROUP 2: After B comes C, and it is the start of Group 2: the letters that "hiss and click". A hard C can sound like K, or Q, or even X. But a soft C can sound like S, which can sound like Z. And in German, a G at the end of the word sounds like a K, which is all right because an X, in Spanish, can sound like a G. Ah, but a Spanish G (or X) can also sound like J. Did you follow that? So Group 2 includes C, G, J, K, Q, S, X, and Z.
Group 3: After C comes D, the start of the "tap" group. Only T can sound like D. That's the whole group.
Group 4: Now it gets simpler. We have thrown out A, used B, C, and D, thrown out E, used F and G, thrown out H and I, used J and K, and that brings us to L. There is no other letter like L. It stands alone in Group 4.
Group 5: After L comes M. It is joined by N. They often cancel each other out, as in damn and government (which many pronounce as "guvermint)".
Group 6: We have thrown out O, used P and Q, and then come to R, which stands alone in Group 6. Amazingly, that's all. We have used S and T, thrown out U, used V, thrown out W, used X, thrown out Y, and used Z.
Now let's look at a few examples and see if you can't "soundex" on your own. First, keep the first letter of the name, and then find the Group number for the next three significant consonants, if there are that many, like this:
Lee = L and then no consonants, or L-000.
Linn = L and then N, but only once, because doubled consonants are only heard once = L-500, which means "a name that starts with L and is then followed by a consonant sound from Group 5, and no others.
Lind = L and the N and then D = L-530.
Linden = L-535.
Lindner = L-535. Gotcha! The Soundex only considers the first letter and the next THREE significant sounds.
L-535 would handle LINDEN, LONDON, LINDNER, and variations of each. Let's see what happens with SCHMIDT. Hint: it isn't S-253. The SCH combination makes only one united sound, so the C isn't counted. S-530 handles SCHMIDT, SCHMITT, SMITH, SMYTHE, and more. M-460 handles MILLER, MULLER, MOELLER, MUELLER, and others.
The only real challenges are names like LEE/LEIGH, VENIA/VEGNA, CORDNER/KORDNER, THOMPSON/THOMSON, or JOHN/JOHNS. They will have different Soundex codes. If at first you don't succeed, try an alternate code.
Many family history records are indexed, mercifully, according to an ingenious phonetic index named "Soundex", and not alphabetically. Why? Because many people changed the spelling of their family names over time or when they moved. Easy examples are SCHMIDT and MULLER, which became SMITH and MILLER when they reached the United States. Others are REID, REED, and READ, and other similar combinations. The Soundex doesn't care how they are spelled; it only cares how they sound.
How does it work? The index doesn't include any vowels because, phonetically, they can't be trusted. That eliminates A, E, I, O, U, and Y and W and H. These last three are often silent or act as vowels. Next, it begins at the first of the alphabet and works its way along, grouping the remaining letters into phonetic groups.
GROUP 1: Let's see; "A" comes first, but we threw it out, so the first letter is "B". All the letters that pop like B are in Group 1. But a B and a V sound the same in some Spanish dialects, and a V sounds like an F in German, so let's includes them. The first group is the "pop and fizz" group, and it includes B, P, F, and V.
GROUP 2: After B comes C, and it is the start of Group 2: the letters that "hiss and click". A hard C can sound like K, or Q, or even X. But a soft C can sound like S, which can sound like Z. And in German, a G at the end of the word sounds like a K, which is all right because an X, in Spanish, can sound like a G. Ah, but a Spanish G (or X) can also sound like J. Did you follow that? So Group 2 includes C, G, J, K, Q, S, X, and Z.
Group 3: After C comes D, the start of the "tap" group. Only T can sound like D. That's the whole group.
Group 4: Now it gets simpler. We have thrown out A, used B, C, and D, thrown out E, used F and G, thrown out H and I, used J and K, and that brings us to L. There is no other letter like L. It stands alone in Group 4.
Group 5: After L comes M. It is joined by N. They often cancel each other out, as in damn and government (which many pronounce as "guvermint)".
Group 6: We have thrown out O, used P and Q, and then come to R, which stands alone in Group 6. Amazingly, that's all. We have used S and T, thrown out U, used V, thrown out W, used X, thrown out Y, and used Z.
Now let's look at a few examples and see if you can't "soundex" on your own. First, keep the first letter of the name, and then find the Group number for the next three significant consonants, if there are that many, like this:
Lee = L and then no consonants, or L-000.
Linn = L and then N, but only once, because doubled consonants are only heard once = L-500, which means "a name that starts with L and is then followed by a consonant sound from Group 5, and no others.
Lind = L and the N and then D = L-530.
Linden = L-535.
Lindner = L-535. Gotcha! The Soundex only considers the first letter and the next THREE significant sounds.
L-535 would handle LINDEN, LONDON, LINDNER, and variations of each. Let's see what happens with SCHMIDT. Hint: it isn't S-253. The SCH combination makes only one united sound, so the C isn't counted. S-530 handles SCHMIDT, SCHMITT, SMITH, SMYTHE, and more. M-460 handles MILLER, MULLER, MOELLER, MUELLER, and others.
The only real challenges are names like LEE/LEIGH, VENIA/VEGNA, CORDNER/KORDNER, THOMPSON/THOMSON, or JOHN/JOHNS. They will have different Soundex codes. If at first you don't succeed, try an alternate code.
0 Comments:
Post a Comment
<< Home