We have been looking at how data management issues can be classified, and in my last post I provided five categories, but broken them down into two groups: Systemic and System. The systemic issues are ones in which process or management gaps allow data flaws to be introduced. A good example occurs when consumers of reports from the data warehouse insist that the data sets are incomplete, and the root cause is that the processes in which the data is initially collected or created do not comply with the downstream requirement for capturing the missing values.
The failure is not a technical one – the systems work they way they were originally intended. However there may be gaps in communication and collaboration, evidenced by the lack of a process to solicit requirements from those who are repurposing data as well as the absence of controls to monitor observance of the downstream expectations.
You might find it odd for me to say this, but you might find it comforting to know that the typical ways one uses the standard toolset will not help solve these kinds of problems. Rather, you can start to see how the tools you have at your disposal can be used to adjust the processes. Consider the typical use of a data profiling tool will analyze a data set as part of an undirected data quality assessment. But the same profiling tool can be used as both a repository for specified data quality expectations and as the method for validating observance of those expectations.
You might not have thought of that approach without understanding the different types of issues. But perhaps a more important value is derived from the ability to assess the scale of the different types of issues as a prelude to prioritizing how you will address them.