Now comes the fun part, inspecting the data. For this step, automated data profiling will help you identify actual problems with the data as they relate to business client expectations. Here are just a few possible issues:
- Are the phone numbers empty?
- Are the admission dates missing in inpatient hospital claims?
- Are there car loans with durations greater than 10 years?
- Do shipping records lack corresponding billing records?
- Do product descriptions differ only slightly?
- Are you delivering products to many different customers with the same address?
- What business rules are being violated?
Using a product such as Informatica Data Explorer, you can significantly speed up data inspection because much of the work is automated
You begin testing your data. Generally, in building a case for data quality, you would only look for the anomalies. There are three major steps in profiling data to find anomalies. First, look at the individual columns of each table. Second, look at the structure of the individual tables. And finally, look for relationships between tables. Here are some basic questions to ask:
- Are the key fields unique?
- Are all the important fields populated?
- Are date fields dates or some other datatype?
- Are natural keys unique?
- Are relationships between tables intact?
- Are there non-unique descriptions in a table with unique keys?
- Can you validate entries against reference data?
- Are there duplicate entries for the same subject?
- Do all the values in a column exhibit the same pattern of the data?
- Does the data conform to basic business rules?
For more on Building a Business Case for Data Quality read my first post.
Please share your feedback and of course any questions you might have.