Deep Metadata Extraction and Data Lineage with an AI-Powered Data Catalog

Last Published: Aug 05, 2021 |

Maneeza Malik

Data lineage is key to building trust in data

The importance of data lineage and deep metadata extraction cannot be over-stressed. Data lineage is key to building trust in data. Without data lineage, it would be virtually impossible for any enterprise to derive value from its data or embark on any strategic initiative at scale such as data governance and compliance, self-service analytics or data warehouse modernization, among other business imperatives.

At an aggregate level, data lineage serves as a detailed map of all your data, enabling you to navigate across an increasingly complex data landscape. With petabyte-scale data that’s spread across hundreds of disparate data sources in a hybrid and multi-cloud environment, automated data lineage allows you to very quickly visualize and trace the flow of your data at scale. It enables you to understand and keep track of the many transformations each dataset may have undergone throughout its lifecycle and across the data pipeline from source to target.

As a leader in the metadata management category as rated by Gartner, we often hear from our customers, prospects and partners alike how vital broad metadata connectivity, deep metadata extraction and AI-powered data lineage and impact analysis are to their business. These capabilities allow them to drive successful data-driven business transformations at scale and to accelerate their digital transformation journey.

Data lineage plays a pivotal role in the democratization of data and for enabling self-service analytics and data science initiatives. It allows users and data owners (data governance leads, data stewards, data analysts, data scientists, and data architects) to very quickly obtain context on the data they have and improve their confidence in the data. They can acquire detailed information on who owns the data, where the data originated, where it resides, and how it gets transformed along the way. Users can obtain insights into the many data dependencies and relationships as well as other pertinent information including identifying which datasets may contain PII (Personal Identifiable Information) data and in which systems it originates. End-to-end data lineage enables users to trace and understand the movement of sensitive data at a granular level across the enterprise.

Achieve Productivity Gains and Cost Savings

The ability to obtain detailed context on data within minutes or seconds versus weeks has enabled many of our customers to not only achieve 2x and more in productivity gains but also create better alignment between data consumers and IT by empowering both stakeholders.

For instance, Anthem, a leading healthcare payer in United States, is using the Enterprise Data Catalog to enable a host of strategic use cases including data governance and compliance, data warehouse modernization, and self-service analytics. In a webinar titled “Anthem: Intelligent Data Cataloging Journey & Adoption Best Practices,” our guest speaker from Anthem shared the vital role data cataloging and automated data lineage is playing in their digital transformation journey. With the Enterprise Data Catalog, over 2000 users at Anthem (a 50/50 split between business and IT) are able to collaborate on data at scale. They are empowered to search, understand, and validate metadata from some 155 data sources that has already been cataloged in a timely manner. Moreover, end-to-end data lineage has enabled them to gain full visibility into their data from source to target with in-depth details. Anthem estimates that as a result, they are achieving approximately $10 million USD in cost savings annually.

Mitigate Risks and Drive Operational Agility

Deep metadata extraction coupled with end-to-end AI-powered data lineage has fast become a “must have” capability for enterprises in order to support their data governance and regulatory compliance initiatives.

In regulated industries such as financial services, auditors and regulatory authorities no longer rely on aggregated results that are shared in reports. Increasingly, they want detailed information to ensure that the underlying data that’s used to produce those reports is as complete, appropriate, and accurate as possible.

Regulations such as BCBS 239, CCAR and Basel III have stringent requirements for comprehensive audit trails to ensure accurate risk reporting. Financial institutions must be prepared to provide details that stem from end-to-end data lineage to support various risk metrics such as Value at Risk (VAR) that are aggregated from hundreds of data sources for reporting purposes. This requires their ability to easily trace and have complete understanding of the flow of data as well as the transformations the data may have undergone throughout its lifecycle with in-depth technical lineage information, including table and column level lineage, from source to target.

For instance, a leading financial institution had a substantial chunk of their data lineage embedded in stored procedures for Oracle and Microsoft SQL Server. They needed access to all of their metadata, including metadata lineage relationships to support regulatory stipulations stemming from BCBS 239. With the Enterprise Data Catalog Advanced Scanners, they were able to parse code from the stored procedures and obtain granular details at scale. This allowed the bank’s compliance officers and various stakeholders to better understand how key data elements were computed and as a result, they were able to create comprehensive audit trails for reporting purposes while reducing impact analysis time from several weeks to minutes.

Rabobank is another customer of ours that’s using the end-to-end data lineage capabilities in the Enterprise Data Catalog to support BCBS 239 requirements. Additionally, we recently had the opportunity to interview the VP of Data Architecture at East West Bank during a webinar on “Take Data Lineage to the Next Level.” Our guest speaker shared their vision and details on their use of the Enterprise Data Catalog as a foundational pillar for their data governance and compliance initiatives. Given the complexity of their data landscape and the need to comply with stringent regulatory stipulations, the bank is also using the Enterprise Data Catalog Advanced Scanners for Microsoft. The Advanced Scanners are purpose-built to extract deep metadata and data lineage from stored procedures for SQL Server, SSIS, SSAS and SSRS, to name a few. The benefits to the bank ranged from operational speed and agility to productivity gains and greater alignment between business and IT teams.

To Learn More

Given how important metadata and data lineage extraction is to drive successful data-driven digital transformation initiatives, it may interest you to know that we will be holding an Enterprise Data Catalog Advanced Scanners Webinar Series starting May 5 , 2021. Attendees will have the opportunity to meet our product experts as well as guest speakers from CareSource. Our speakers will share best practices for extracting metadata and data lineage from some of the most complex systems and enterprise applications including data sources from Microsoft and Teradata. We would love to meet you. It’s also an opportunity for you to ask questions of our experts.

Additionally, you may wish to read the eBook on AI-powered Data Lineage: The New Business Imperative, our latest blog on Seven Metadata Management Best Practices Every Successful Data Leader Must Know and watch the on-demand webinar on Meet the Experts: Enterprise Data Catalog 10.5.

First Published: Apr 26, 2021