The Next Generation of Metadata Management (Part 1)

This is Part 1 of a series on the Next Generation of Metadata Solutions. In this post, we begin with the roots – The need to revamp existing metadata management solutions.

History

Metadata Management originated in the late 80s and early 90s, when Data Warehousing was sowing its roots. Businesses had expanded their definitions for data warehousing to include three types of tools:

  • Business intelligence tools
  • Tools to extract, transform and load data into the repository
  • Tools to manage and retrieve metadata

The origins Metadata Management lay here. The fundamental use case was to support the ETL tools for debugging and manage metadata. The 3 main use cases for early Metadata Management solutions included:

  1. Display of Data Lineage
  2. Impact Summary and Analysis
  3. Unified view in a Metadata Catalog

This meant that, in addition to closely being tied to the ETL tool used in IT, metadata management solutions included the fundamental use case to increase productivity with Impact Analysis, provide visual representation of data movements with lineage and also store all related metadata of the data warehousing solutions in its own metadata catalog.

Role of Data Governance in the expansion of Metadata Management

Over the years, the data began to explode and decision making increasingly relied on data analysis. In addition to this, business processes began to consume increasingly large amounts of data created in distant parts of an organization for a different purpose. Then the requirement came it to profile and discover the data and tie it to the business outcomes. This was fundamental to the success of Data Governance initiatives of the new millennium.

Various business-led governance initiatives understood the need to care for the enterprise wide data and created collaborative processes to manage a core set of data that were deemed critical for the business. More significantly, the whole process was tied to a policy-centric approach to data quality standards, data security, MDM and lifecycle management.

Below is a graphical illustration of the 5 pillars of data governance and there was the increasing need to closely connect Metadata with the business outcomes including Data Quality and Data Security.

 

pic1

 

Metadata and Business Glossary had to be closely associated with each other. Otherwise, how would you connect to let’s say – a table attribute that could be something abstract to a meaningful business definition? Also there was a need to seamlessly collaborate between the IT and business which would involve a common solution to tie these aspects of technical metadata with business definitions.

This means that the metadata solution had jumped one place up to closely tie to the data governance use case and cater to the business led initiatives. From that point on, metadata and business glossary were inseparable within the data management context.

Role of Big Data in shaping the Enterprise Catalog solutions

The Metadata solutions didn’t stop just there – did it?

When we see how the data flows through these big data ecosystems, we still haven’t lost focus of data governance where the business analysts would still want to search for what they’re looking for and find from the results either based on relevance or the data they trust. This again ties to the data lineage of where the data came from, the definitions and semantics of what the data means, when it was last updated or defined.

There is also the next big factor of data security where there is a need to understand who’s been looking at what data, and whether or not they were allowed to, while providing the authorization and authentication to the data that we like.

Where is all this information stored at – It is the Enterprise Metadata Warehouse or the Enterprise Metadata Catalog which is the central concept to the next generation of Metadata Management solutions.

Retrospective View of Metadata Management vis-à-vis the Next-Gen Solutions

If we have to briefly compare the evolution of metadata management, it is the coalescing of next-gen technologies including machine learning with the policy driven era of data governance. The entire core intelligence of platforms relies on the metadata warehouse and the below snapshot can articulate this aspect:

pic2

In the next part of the series, we take a detailed look at the need to have an enterprise wide catalog.

Related Post:

Part 2: The Next Generation of Metadata Management (Part 2)

Comments

  • Mayank Srivastava

    Good article. But not very convinced by the pillars of data governance. I am not sure why you considered metadata and business glossary separately. Business glossary itself is a business metadata. Secondly, for a better data governance, my suggestion would be to refer to the DAMA DMBOK and consider adding all of the items into the structure. I really like the idea of using the Big Data technologies and capability of machine learning for metadata management, Something which i have been advocating for sometime.

    • Kris Meukens

      There is no such thing as “business metadata”. To business, it are all just data. The need for multiple meta levels arises when building information technology solutions. Metadata is often said to be “data about data”, but that is misleading. Some data provides the means to define the “semantics” and “structure” of other data. Hence my definition of metadata: metadata is data that has been selected to be made machine understandable and which consequently defines, describes, explains and/or locates the structures that “contain” other data, and contributes to the utility of data throughout its lifecycle.