3 Reasons Why Data Lakes Have Not Delivered Business Value and What You Can Do About It
As a new employee at Informatica leading product marketing for our big data management portfolio, I’ve been spending a lot of time evaluating the data management landscape for analytics. One CIO.com article caught my eye. In “6 Data Analytics Trends That Will Dominate 2018,” journalist Thor Olavsrud posited that “Data lakes will need to demonstrate business value or die.”
Well, we’re just a few weeks into the end of 2018 and I’m going to say that no, data lakes have not demonstrated business value this year. Quite a radical statement for a new Informatica employee to make, I know. But let me explain my thinking.
Overall, businesses are still using data lakes in experimental, skunkworks efforts. Data scientists and business analysts appreciate the agility of being able to quickly access data for experimentation and to answer hypotheses to solve business challenges. But getting from experimentation to operationalization across the enterprise is still a struggle for many organizations. There are three reasons why:
1. Some data lakes are still data swamps
Data lakes have become dumping grounds for all sorts of data with varying quality. While it’s great to be able to analyze a broad set of data to discover new insights, the data has to be trusted. Give your data lake a makeover with a data quality strategy that applies rules to cleanse and enrich the data. Check out this recent blog post, “Optimize the Data Pipeline for Big Data” for more about strategies to ensure high quality data for big data analytics.
2. World explorers didn’t create maps for data
If you’re driver, chances are you rely on Google Maps or Waze. Not only do navigation tools get you from A to B using the most efficient route, you can also use them to help you find landmarks and services on the way, such as the closest café or gas stations. Imagine if you had something similar for the data across your enterprise.
Enterprise data catalogs scan and collect metadata from enterprise systems—including many types of databases, applications, and tools—and automatically builds out a metadata and relationship graph exposed via REST APIs. The result is that end users and developers can query metadata for other applications or integrations. AI- and machine-learning-driven data catalogs discover and classify data, so users get a very intuitive search experience. It’s like Google for all of your enterprise data. Check out this Informatica blog post “Data Cataloging is the First Step,” which explains more.
3. Data scientists aren’t adequately equipped for quick data access
As I mentioned in my previous blog post, self-service analytics has transformed traditional data management. Data preparation tools empower data scientists and other data power users to quickly find and analyze the data they need because they can prepare the data themselves. Users should be able to merge, transform, and cleanse relevant data into more trusted and certified forms so they’re in a better position to analyze the data. And this helps the business operationalize the data because users can publish their prepared datasets back into collaborative workspaces. This way, multiple business stakeholders can access and prepare the data together.
I’ve listed just three reasons why data lakes have yet to prove business value. But there are nine design principles for data lake management that puts you in a better footing to create data lakes that do provide business value. I’d highly recommend downloading our white paper, “The CDO’s Guide to Intelligent Data Lake Management,” for a deep-dive into the nine principles.
Until next time.