2

Lessons From Kindergarten: The ABC’s of Data

People are obsessed with data. Data captured from our smartphones. Internet data showing how we shop and search — and what marketers do with that data. Big Data, which I loosely define as people throwing every conceivable data point into a giant Hadoop cluster with the hope of figuring out what it all means.

Too bad all that attention stems from fear, uncertainty and doubt about the data that defines us. I blame the technology industry, which — in the immortal words of Cool Hand Luke has had a “failure to communicate.” For decades we’ve talked the language of IT and left it up to our direct customers to explain the proper care-and-feeding of data to their business users. Small wonder it’s way too hard for regular people to understand what we, as an industry, are doing. After all, how we can expect others to explain the do’s and don’ts of data management when we haven’t clearly explained it ourselves?

I say we need to start talking about the ABC’s of handling data in a way that’s easy for anyone to understand. I’m convinced we can because — if you think about it — everything you learned about data you learned in kindergarten: It has to be clean, safe and connected. Here’s what I mean:

Clean

Data cleanliness has always been important, but assumes real urgency with the move toward Big Data. I blame Hadoop, the underlying technology that makes Big Data possible. On the plus side, Hadoop gives companies a cost-effective way to store, process and analyze petabytes of nearly every imaginable data type. And that’s the problem as companies go through the enormous time suck of cataloging and organizing vast stores of data. Put bluntly, big data can be a swamp.

The question is, how to make it potable. This isn’t always easy, but it’s always, always necessary. It begins, naturally, by ensuring the data is accurate, de-deduped and complete.

Connected

Now comes the truly difficult part: Knowing where that data originated, where it’s been, how it’s related to other data and its lineage. That data provenance is absolutely vital in our hyper-connected world where one company’s data interacts with data from suppliers, partners, and customers. Someone else’s dirty data, regardless of origin, can ruin reputations and drive down sales faster than you can say “Target breach.” In fact, we now know that hackers entered Target’s point-of-sales terminals through a supplier’s project management and electronic billing system. We won’t know for a while the full extent of the damage. We do know the hack affected one-third of the entire U.S. population. Which brings us to:

Safe

Obviously, being safe means keeping data out of the hands of criminals. But it doesn’t stop there. That’s because today’s technologies make it oh so easy to misuse the data we have at our disposal. If we’re really determined to keep data safe, we have to think long and hard about responsibility and governance. We have to constantly question the data we use, and how we use it. Questions like:

  • How much of our data should be accessible, and by whom?
  • Do we really need to include personal information, like social security numbers or medical data, in our Hadoop clusters?
  • When do we go the extra step of making that data anonymous?

And as I think about it, I realize that everything we learned in kindergarten boils down to down to the ethics of data: How, for example, do we know if we’re using data for good or for evil?

That question is especially relevant for marketers, who have a tendency to use data to scare people, for crass commercialism, or to violate our privacy just because technology makes it possible. Use data ethically, and we can help change the use.

In fact, I believe that the ethics of data is such an important topic that I’ve decided to make it the title of my new blog.

Stay tuned for more musings on The Ethics of Data.

FacebookTwitterLinkedInEmailPrintShare
This entry was posted in Big Data, Data Governance and tagged , . Bookmark the permalink.

2 Responses to Lessons From Kindergarten: The ABC’s of Data

  1. Clarke Patterson says:

    Nice article, Marge. I would like to refute one point, however, and that is your statement that Hadoop is to blame for data quality in big data. Data quality in any scenario (big or small data) is a process problem, not a technology problem. It’s the people, systems and processes that create the data quality issues in the first place. This is why organizations build data governance teams and assign data stewards. Now that said, technology certainly can complicate things. Any organization looking to deploy Hadoop must certainly consider how it will weave into their quality and governance programs otherwise problems will ensue. At the end of the day, data is just data. If we don’t handle it properly in all cases, then it doesn’t matter what technology is at the heart of an infrastructure, quality problems most certainly will come up. As you say “It begins, naturally, by ensuring the data is accurate, de-deduped and complete.”

  2. Marge Breya says:

    Thanks for the comment! I don’t disagree that dirty data comes in all sizes — big and small. I believe that the next steps in big data will identify the clean, the dirty and the sensitive. And then help business people from all walks of life to prepare that data so that it is outcome-ready. Hadoop is a great technology. It’s the people, processes and machines that use it who need to be aware of its state!

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>