Driving Analytics Data Management Competence with Metadata
Welcome back to our series of Analytics Chalk Talk videos. I hope you enjoyed our last video: “Data Management Competency for Analytics at the Speed of Business.” In this edition, I’m discussing the magic of metadata and how it will help you attack the amount time needed for data prep so your analysts can get to great data quicker.
Please watch the video or read the transcript. You might be able to tell I’m pretty passionate about this particular area, so share your views in the comments below or at Twitter–@leanlyle.
Twenty years ago, if you set about building a data warehouse, you’d have been prepared to spend 75 percent of your time doing the ETL. These days it’s accepted that you’ll spend 80 percent of your time on data preparation for the data scientist.
Those are massive, painful chunks of time. And yet all the love goes into the shiny presentation layer. We need to use data management to combat that 80 percent figure and serve data to the business more quickly. But how?
The answer is metadata. To me, the difference between success and failure is proportional to the investment an organization makes in its metadata management system. But it doesn’t stop there—metadata should also be everyone’s guide as to which data preparation artifacts can be reused.
For over a dozen years my mantra at Informatica — I even put it over the door—was “Use the computer.” It drove the R&D with the product, which was all about using metadata to process and understand the data itself. To be able to, for example, recognize social security numbers and credit card numbers and figure out what kind of cleansing was needed. And then, do it automatically; reuse stuff that’s been done before.
That’s the magic of metadata. Now it needs to be in hands of the people doing data prep so we can attack that 80 percent.
That’s the start. Next, you need good data, because the more sophisticated your processing, the better the data needs to be. With those two things in place we can move towards self-service.
Self-service is not just a case of handing over easy-to-use BI tools to a bunch of people. You have to prepare, annotate, and semantically define all the data to make it usable and trustworthy.
The key to this is looking at integration and metadata as a factory. There are a small number of patterns that we’re repeating. Realize that, construct a factory and look at this metadata management foundation. The benefits will be huge: benefits in how we’re dealing with the cloud, how we’re integrating data in a hybrid environment, how we’re dealing with big data in a Hadoop environment, in a NoSQL environment. Deal with all these different things as if it’s a factory and think of metadata management as the foundation of it all.
Other videos in the series: