Humans or Machines? You Can Have the Best of Both with an Enterprise Data Catalog

“Water, water everywhere, nor any drop to drink” goes the oft-repeated line from Samuel Taylor Coleridge’s famous poem “The Rime of the Ancient Mariner.” The same could be said about the challenge of finding relevant data in a sea of data as enterprises embark on digital transformation of their businesses.

Data CatalogDigitization of business is leading to a data deluge and there’s data everywhere across the enterprise – traditional on-premises systems, a variety of multi-cloud environments, and new cloud and on-prem big data stores and data lakes as well. Data is coming at you in varied data types and formats. But how does a data or business analyst find the right data for her analysis in this complex and constantly evolving data landscape? And how to make this data available on demand to support different business needs? The first step in addressing this problem is understanding what data you have and where, and providing that visibility to data consumers across the enterprise. With hundreds, thousands or even more datasets to deal with in a typical enterprise, this is a problem that can no longer be addressed with manual processes. So how are enterprises dealing with this challenge?

Data cataloging at enterprise scale

Intelligent data catalogs have the ability to automatically scan and catalog assets across the enterprise and enable discovery through a simple search. But it’s not enough to just provide visibility into all data. Users have to be guided to the most relevant and trusted data. That requires visibility into what each dataset is, where it came from, the quality of the data and how its related to other data. Doing this at enterprise scale requires machine intelligence powered by a deep understanding of metadata from varied types of data repositories, data integration tools, business intelligence tools and applications distributed across multi-cloud and on-premises. It requires intelligence to guide the users to what they don’t know, for instance, recommending alternate or complementary datasets that would enhance their analysis. It requires adding business context to the technical data assets at scale, so it can be tied to a data governance and compliance initiative.

We need machine intelligence leveraging enterprise-wide metadata to address this challenge at enterprise scale. Is that enough? What about the tribal knowledge and contextual understanding that is often siloed across the enterprise in different departments and locations? A search using simple business terms may bring up a hundred relevant datasets but only a handful may be most relevant for the analyst’s needs. And that hard work to winnow down the list may already have been done by experienced analysts in other locations or functions. They may offer additional business context and nuanced understanding of the data that vastly improves the analyst’s efficiency.

There is no need to reinvent the wheel. Instead, we need a way to leverage this deep, contextual knowledge, and be able to bring that to the forefront. At the same time, we have to enable this without losing the scale and performance we get from machine intelligence. The shared data knowledge has to be combined with machine intelligence to enrich and curate data and guide user experience. It requires harnessing the combined power of AI and human expertise.

To learn more about how Informatica’s Enterprise Data Catalog delivers on this promise, register for our Spring Virtual Launch event:

Comments