Extract Metadata and Data Lineage from Your Proprietary and Custom Data Sources

Last Published: Aug 05, 2021 |

Maneeza Malik

Data cataloging is a critical component of any data or metadata management strategy. Central to data cataloging is metadata connectivity. It helps establish the unified metadata foundation required to drive successful data-driven digital transformation initiatives with intelligence and automation.

At Informatica, we have been working with many of the leading Global 2000 enterprises to help them intelligently catalog and govern all their data at scale across a highly complex data landscape. The Informatica AI-powered Enterprise Data Catalog offers the broadest and deepest metadata connectivity.

Metadata management for complex data sources

Our customers across verticals and geographies have hundreds if not thousands of data sources spread across on-premises and multi-cloud environments, creating a complex data landscape. These data sources include many of the industry’s leading:

Databases, cloud data warehouses, and cloud data lakes
Complex enterprise applications and mainframe systems
Apache Hadoop, Spark, and Kafka clusters
BI tools and multi-vendor ETL tools, to name a few

In addition to these mainstream data sources, for a large majority of enterprises, a significant chunk of their data and metadata also resides in what we refer to as secondary data sources. These include proprietary solutions, Microsoft Excel spreadsheets, and various file formats such as JSON, XML, and CSV files.

A key hurdle that comes with many of these secondary data sources is that they contain a lot of custom content and custom code that is difficult to surface and even harder to extract. In regulated industries, this often becomes a big problem, as enterprises need access to all their metadata and the ability to obtain data lineage at a granular level for regulatory compliance and reporting purposes.

Often, enterprises are either forced to leave this valuable data untapped, creating gaps in their ability to extract value from it, or they end up embarking on time-consuming and costly IT-led initiatives to build and maintain makeshift metadata connectors that require extensive manual manipulation and lengthy programming and scripting cycles with limited automation and virtually no ability to scale.

Our goal has always been to enable our customers to scan and extract metadata and data lineage with in-depth information at a granular level from all their primary and secondary data sources so that they can build the most holistic and complete data catalog to support their business imperatives such as:

Accelerating their digital transformation journey
Enabling enterprise-wide data governance and regulatory compliance
Democratizing the use of high-quality and fully governed data in order to deliver timely and accurate insights
Fostering greater collaboration and innovation on data
Facilitating speedy cloud data warehouse and data lake modernization endeavors
Effectively deploying Business 360, data science, and AI initiatives at scale

To this effect, over the summer, we added advanced scanners to our ever-growing list of metadata connectors. The Informatica Enterprise Data Catalog Advanced Scanners allow our customers to:

Parse code from various stored procedures for databases, complex applications, and mainframe systems as well as multi-vendor ETL tools
Extract deep metadata, data lineage, and data relationships at scale from both static and dynamic code
Obtain complete visibility into stored procedure calls with parameter tracking in dynamic SQL code
Automatically derive data lineage for different code paths based on parameter values, database queries and more

Import data lineage directly into your data catalog

To address the secondary data sources, our Enterprise Data Catalog customers can leverage the Enterprise Data Catalog Advanced Metadata Custom Loader. The Advanced Custom Metadata Loader is specifically designed to easily surface, scan, and extract metadata and data lineage from proprietary solutions (these are either home-grown or vendor-based solutions that have been heavily customized over the years), custom code, and custom content residing in relational databases, Microsoft Excel spreadsheets, and various file formats such as XML, JSON, and CSV files.

For instance, with the Advanced Custom Metadata Loader, users can rapidly define custom metadata models and populate them automatically, including importing metadata in any form. Moreover, existing data lineage documentation that may reside in Excel spreadsheets or other data sources such as relational databases or CSV files can be directly loaded into the Enterprise Data Catalog without requiring any additional custom modeling or programming.

The Enterprise Data Catalog Advanced Custom Metadata Loader allows users to:

Easily create a metadata model and populate it automatically in Informatica Enterprise Data Catalog in hours versus spending months in development time
Obtain complete auditing and governance control over the entire metadata extraction and loading process and eliminate blind spots
Gain full visibility across all their custom metadata sources and create a holistic data catalog

Next steps

To learn more, I invite you to read the Enterprise Data Catalog Advanced Custom Metadata Loader datasheet and watch the on-demand webinar: Leave No Metadata Behind where experts from Informatica and Westpac Bank discuss best practices for extracting metadata from complex enterprise systems with the help of the Enterprise Data Catalog Advanced Scanners.

First Published: Oct 19, 2020