Data Integration - Informatica

Informatica Data Quality

Is there life on Mars?

Chris McCauley

This week NASA announced that it may have discovered evidence of water flowing on the surface of Mars in the recent past. This raises the possibility of life existing on Mars in the past and even to the present day.

A lot of talent, time and money has gone into addressing the fundamental question - is there life on Mars. In addition to the two rovers on the surface, there are currently three spacecraft in operation around Mars; Mars Odyssey, Mars Reconnaissance Orbiter and Mars Express – sadly the Mars Global Surveyor which is responsible for this weeks exciting news has probably suffered a severe failure and is in effect lost. Each spacecraft has been sent out to gather some basic statistics about the Red Planet such as; how the surface changes over time, the percentage of carbon-dioxide at the poles or how the temperature varies throughout the atmosphere. All of this in the hope that we move closer to a definitive answer to that fundamental question. But forget the answer: are we asking the right questions?

The fundamental problem facing NASA and its European counterpart the ESA when planning Mars missions is how to know what to look for? We can gather as much data as our tools allow, but how do we know which measurements to make? Will the Global Surveyor photographing Mars from orbit tell us much about the possibilities of life existing under the surface? Are there more appropriate measurements to make and better questions to ask? To address these concerns NASA and the ESA employ multidisciplinary teams in an attempt to discover the most appropriate metrics to gather and how best to make the supporting measurements.

There is an analogy here with data quality. A key goal of data quality is to deliver dashboards and quality metrics that give business executives the confidence to make strategic decisions based on demonstrably reliable data. Unless we are asking the correct questions and making the right measurements all those pretty dashboards are not just worthless, they are misleading and therefore dangerous.

We can take the naïve approach of gathering basic statistics about our data; percentage of missing values, the distribution of values, even the correlation between values in one field and those in another. But this doesn’t address the fundamental question: IS THE DATA FIT FOR USE? Like the space agencies we don’t want to gather any available metric; we need to identify the correct measurements to make and figure out how to make those measurements.

When we talk about measuring data we talk about gathering metadata. I find it useful to organize the possible metadata into perspectives.

Somehow it seems to make it easier to explain to people what metadata is really about (please don't say it's data-about-data) and where to look for it, if not how to actually gather it. You could consider each perspective to be “looking” at different pieces of the totality of metadata associated with the “real” data. I also often make the distinction between technical metadata and business metadata. By technical metadata, I mean the perspective on the data that would be common to the IT user such as a DBA or programmer. From this perspective, issues such as how the data is stored, indexed and retrieved are the prime consideration. By business metadata, I mean the perspective that views the data as representing facets of the business such as customer addresses, purchase orders, credit ratings etc. In this perspective there are measurements that apply to how the data is used and the business relationships between the data items.

Measurements applied at the technical level are largely independent of the business domain. In the same way that the Mars missions gather basic measurements about the climate on that planet, this technical metadata adds to our overall understanding of the data but does not directly address the fundamental question of fitness for purpose. Technical metadata can be produced with little or no human intervention – relational database management systems, ETL tools and Data Quality products produce some or all of these metrics on-the-fly or as the side effect of other work – but this convenience should also make us question the ultimate value of this metadata. We need to look to the business specialists to help us to find the right questions to ask, the right measurements to make.

I believe that we need tools which can be used by people beyond the IT department. We need to allow the business user to decide what should be measured and not just to leave it to IT. If we do not recognize that data quality is a problem requiring a multidisciplinary approach then we are not addressing the fundamental question of is this data fit for use?

Is there life on Mars? I sincerely hope so, but I know that NASA and the ESA are not only relying on the rocket scientists to help them find the answer.
Data quality, rocket science and little green men. Are the right people in your organization asking asking the right questions to find out if your data is good enough to be used in a business context?

No Comments, Comment or Ping

Reply to “Is there life on Mars?”