Yet Again a Note on Data Quality
The easiest way to understand the importance of Data Quality, especially for those that deal with data processing daily, is to look around and pick up any physical object of daily usage and imagine the process of its transformation from a mix of raw ingredients to an unfinished final product. Would that object be still be preferred ?
Imagine a product of your choice, say a T-shirt that you frequently work out in. After wearing it for couple of days, you find the quality is quickly wearing out. Had you known about the quality issues beforehand, you may not have purchased it in the first place.
The point is, all products get manufactured under well defined, well understood and well implemented quality-controlled processes. One failure would put aside the entire lot, that either need to be reworked (in select cases) or completed discarded, especially when it is related to life, health or personal security.
Would the FDA ever compromise on any of the consumer parameters? Never!
But when it comes to data, the whole system seems to be flexible for one reason or the other.
The final consumable products by business users are the standard reports, dashboards or the analytical models that rely on data. Many times, this data itself lacks basic nutritional value (accuracy), but is still processed to keep things (like reports) live.
If we as consumers never accept that a carton of milk’s quality as inferior, how and why do we use compromised data, or continue to live with the processes that are not robust enough to catch the lame data points. Worst of all, the origination of most of the data problems are from its source.
Lame data will cause limpness in the outcomes.
In most of the cases, it’s the IT team which processes the data and maintains the application. It is expected to monitor, control, correct or refurbish the bad data and keep system rolling.
Bad data quality costs billions each year, directly or indirectly, and still not enough is done for the eradication of the bad data.
Only a detailed diagnosis will help in identifying the root cause (read DQ issue and causes) and that may also include complete elimination of certain food intake (read data source).
So what is required?
Each organization needs a data council, which must conduct data quality audits for units (projects and applications) that generate, process or consume data. Then, it must issue a certificate of compliance with actionable items on the identified gaps.
The business (end clients) would appreciate the internal data audits by the organizations. The organizations will benefit from the low maintenance cost in the long run, thus increasing margins in the maintenance contract, which can further be extended to the end clients also.
I must conclude on a note that if we don’t compromise on road safety, food safety, social security, financial safety, then why are we (IT) flexible towards data quality issues and processes? This is something to ponder and work upon.