For all the data quality developers – are you sure you’re using your standardization techniques efficiently?
Namecandy.com a website that compiles and analyses all the new names of babies, suspects that parents are seeking Google whacks, names that return a single hit only. Following are some real first names that are in the top thousand most popular names and let’s see how we can identify them as new contacts. What / who are new contacts? Names that do not exist in reference tables are deemed to be new data, for this exercise. In focus here is the Token Labeler and FirstName reference table.
Use a Token Labeler transformation, apply the Firstname reference table in an Inclusive mode.
Reference table in ‘Inclusive’ mode
Profile the transformation, the new Contacts get identified appropriately as WORD by the Token Labeller
Now, let’s say, we wanted to follow up with all contacts with a specific first name or last name, viz. a first name of ‘AKIER’. We can do that by using the Custom Label feature. Create a new custom label as shown.
Profile and you can see a label of ‘new_contact Surname’ with a drilldown value of ‘AKIER’
Please post your comments below, send your comments to Informatica University or join our community discussion at the Informatica University LinkedIn Group.







