So wrote Potter Stewart, Associate Justice of the Supreme Court in Jacobellis v. Ohio opinion (1964). He was talking about pornography. The same holds true for data. For example, most business users have a hard time describing exactly what data they need for a new BI report, including what source system to get the data from, in sufficiently precise terms that allow designers, modelers and developers to build the report right the first time. But if you sit down with a user in front an analyst tool and profile the potential source data, they will tell you in an instant whether it’s the right data or not.
Let me continue with another provocative statement. Profiling source data in a data integration initiative should not be a best practice; it should be the ONLY practice. In other words, it should not be optional – no questions asked.
Anyone who has worked on a data integration or data migration project knows that if the first time you look at real production data is during QA, you’re in for multiple test cycles while addressing the data anomalies that are uncovered. Or worse yet, like the chutes and ladders game, you’re back to square one redesigning the solution to work around the issues that are uncovered.
Years ago you could be excused for not profiling source data in the requirements phase. The functional analysts and business users didn’t have the tools to access the data and had to rely on data architects or developers to create custom queries or hand-coded software to profile data – which was a pain since it was like launching yet another mini-project which added weeks or months to the schedule. So they wrote the requirements in Word documents based on prior documentation or what they “thought” the data should be. After weeks of discussions and many pages of requirements documents, it is still hit-and-miss in terms of describing the needs in a way that accurately captures the intent.
Now there is no longer an excuse. With data profiling tools that are as easy to use as a spreadsheet, business users can profile data, and even create basic source-target mappings, in a browser-based tool. It’s amazing what happens when users actually look at the data – they can tell you in an instant whether it’s correct or not. The Informatica user experience with the Informatica Analyst tool is amazing. After just a few hours in front of a profiling tool connected to the actual source data (or a copy of it) there is little doubt about what fields should be extracted, what they mean, and what data content variations are of concern and which ones just don’t matter.
A few months ago I wrote a blog article on Agile DI and Agile BI. The tools and technology involved do indeed enable an agile approach, but arguably the most agile technique of all is to simply look at the data early. Not only does it dramatically shorten the requirements definition phase (by up to 90% in many cases), but it communicates the requirements to designers and developers much more clearly so that the rest of the project also speeds up. Now THAT is agile!