Yet Again a Note on Data Quality

Data Quality
Yet Again a Note on Data Quality

The easiest way to understand the importance of Data Quality, especially for those that  deal with data processing daily, is to look around and pick up any physical object of daily usage and imagine the process of its transformation from a mix of raw ingredients to an unfinished final product. Would that object be still be preferred ?

Not really!

Imagine a product of your choice, say a T-shirt that you frequently work out in. After wearing it for couple of days, you find the quality is quickly wearing out. Had you known about the quality issues beforehand, you may not have purchased it in the first place.

The point is, all products get manufactured under well definedwell understood and well implemented quality-controlled processes. One failure would put aside the entire lot, that either need to be reworked (in select cases) or completed discarded, especially when it is related to life, health or personal security.

Would the FDA ever compromise on any of the consumer parameters? Never!

But when it comes to data, the whole system seems to be flexible for one reason or the other.

The final consumable products by business users are the standard reports, dashboards or the analytical models that rely on data. Many times, this data itself lacks basic nutritional value (accuracy), but is still processed to keep things (like reports) live.

If we as consumers never accept that a carton of milk’s quality as inferior, how and why do we use compromised data, or continue to live with the processes that are not robust enough to catch the lame data points. Worst of all, the origination of most of the data problems are from its source.

Lame data will cause limpness in the outcomes.

In most of the cases, it’s the IT team which processes the data and maintains the application. It is expected to monitor, control, correct or refurbish the bad data and keep system rolling.

Bad data quality costs billions each year, directly or indirectly, and still not enough is done for the eradication of the bad data.

Only a detailed diagnosis will help in identifying the root cause (read DQ issue and causes) and that may also include complete elimination of certain food intake (read data source).

So what is required?

Each organization needs a data council, which must conduct data quality audits for units (projects and applications) that generate, process or consume data. Then, it must issue a certificate of compliance with actionable items on the identified gaps.

The business (end clients) would appreciate the internal data audits by the organizations. The organizations will benefit from the low maintenance cost in the long run, thus increasing margins in the maintenance contract, which can further be extended to the end clients also.

I must conclude on a note that if we don’t compromise on road safety, food safety, social security, financial safety, then why are we (IT) flexible towards data quality issues and processes? This is something to ponder and work upon.

Comments

  • I’ve did’t understand your providing information, because i’m beginner of informatica, i need clear information.

    Insert some related videos to helpful understand quick and earlier.

    • Anand Rawat

      Hi Brinda, thanks for the comments. The concept is less about Informatica as Tool but more about the concept of Data Quality. The point is very simple that the data which we use in our reports, dashboards or any analytic program must be good quality otherwise its no use.
      very simple example: you have an excel with First name, last Name and gender. if 40% of records have blanks in the gender column, then a simple report may give result like 35% female 25%male, 40% others. and you will not be able to decide about what percent of the population is male vs female.

      Hence before this data is consumed for reporting, the team providing this data must ensure that all fields are populated with relevant information. Only then you will be able to see that 55% of population is female and remaining 45% are male and that will make the report meaningful