Tag Archives: Identiy Resolution
Even in “good” data there is a lot of garbage. For example a person’s name. John could also be spelled as Jon or Von (I have a high school sports trophy to prove it). Schmidt could become Schmitt or Smith. In Hungarian my name is Janos Kovacs. Human beings entering data make errors in spelling, phonetics, and keypunching. We also have to deal with variations associated with compound and account names, abbreviations, nicknames, prefix & suffix variations, foreign names, and missing elements. As long as humans are involved in entering data there will be a significant amount of garbage in any database. So how do we turn this gibberish into gems of information?