Driving Metadata Intelligence with Informatica’s Enterprise Data Catalog
There hasn’t been a more exciting time for metadata, at least not since President Obama referred to metadata in the context of phone calls – and bestowed it with a whiff of notoriety. Now metadata, best exemplified by the data catalog, is getting a red-carpet treatment (swathed in black of course – see Gartner’s Data Catalogs are the new black).
The need for a data catalog to inventory, classify and document an organization’s data has increased sharply. Just a few years ago, the world was simpler, the number and complexity of applications and analytical systems in an enterprise was limited. People could keep the information about their data in their heads or manually document it. Tedious but doable.
But things have changed dramatically, driven by cloud and big data, there’s an explosion in the complexity and scale of organizations’ data landscape. Now there are dozens of SaaS apps, big data deployments, IoT (internet of things) systems, external data providers, cloud-based data warehouses and data lakes within an enterprise. The variety is bewildering and it’s next to impossible to understand the various systems, data they contain and their relationships. At the same time there is increased desire among users to find and use data they need as quickly as possible.
That’s where we believe a modern data catalog plays an important role, it provides a means to understand all the information available in an enterprise, its quality, privacy and relevance. Users across the enterprise are empowered to find, understand and use the right, high-quality data for analytics, data science, governance or almost any other purpose.
At Informatica we believe catalogs are critical part of all data management initiatives. In Informatica’s Enterprise Data Catalog (EDC) we focus on 3 core capabilities:
- Get broad and deep metadata quickly by connecting to any enterprise systems across cloud, on-prem, and big data anywhere, including structured, unstructured, files, applications, analytical and data integration systems
- Combine the power of machine learning and human knowledge to automatically and efficiently classify, label, curate and relate data, at scale
- Make rich metadata content of EDC available to both Business and IT users from right within their applications and analytical tools
It has been an incredibly spirited time for the metadata teams at Informatica. We are fortunate to see such a tremendous interest and uptake in Enterprise Data Catalog – adding more than 100 customers over the past year has been hectic but exciting. At the same time the team has been busy planning, developing and releasing the next version of Enterprise Data Catalog. The Spring 2018 release of Enterprise Data Catalog adds many new features which meet the increasing needs of our customers and advance its critical capabilities, including:
- Additional REST based API’s which improve the integration of Enterprise Data Catalog within user applications
- Expanded connectivity to new data sources – File systems (Azure BLOB, Azure Data Lake Store, MAPR file systems, SharePoint, OneDrive), BI (QlikView), and PL/SQL.
- Improved AI capabilities powered by our CLAIRE™ engine to enable data curators to quickly associate business terms with datasets and business users to find related data assets quickly – overall similarity score, smart domain discovery and unsupervised clustering of similar columns using additional factors like synonyms, unique values and data patterns.
Internally we sometimes refer to EDC as the Sun, with each planet being a data management initiative. Be it data warehousing, master data management, data lakes, analytics, data science, data migration, etc – all these projects are warmed by the knowledge and understanding of the data the catalog provides. Or we can skip the analogies and just bask in the fact that catalogs today are hipper than Cardi B… or at least an avocado toast.