The Next Generation of Metadata Management (Part 2)

In Part 1 of this series, we began with the roots of Metadata Management. In this post, we look at the need to have an enterprise metadata cataloging solution.

Big Data’s close relation to Metadata Management

Now is the new era of Big Data. This includes the next-generation capabilities of various areas including machine learning, predictive analytics and statistical analysis. However, these very powerful big data processing and storage capabilities that are continually changing. And the central underlying technology is that of Metadata. Having a universal metadata catalog gives us this transparency, this view of how the integration is done.

Also, the Hadoop ecosystem is constantly evolving and most of the platforms do not have any descriptive information of their data. Therefore, the data needs to be discovered and relationships need to be inferred rather than ‘presented’ readily to the users. This use case highlights the key need to have a Catalog as part of Metadata Management which collects data assets across the enterprise.

360-degree relationship discovery

There is a necessity to have a 360-degree view of data to easily search, discover, and understand enterprise data and meaningful data relationships. Discovery process also involves the finding of related data sets, technical, business, semantic and usage-based relationships. Below are the commonly encountered use cases:

  • Finding relevant data sets with powerful semantic search capabilities
  • Discovery of sensitive elements within the big data landscape
  • The need to discover similar data assets

Metadata Logical Architecture: Data Flow

The below diagram illustrates the logical metadata architecture where data flows all the way from various sources all the way through to the BI layer:


The cataloging solution is not just spread across the data lake but across the entire enterprise. In short, the entire backbone of the enterprise data management rests in the metadata warehouse. Therefore, the need to discover the data and perform powerful semantic searches rests on the core architecture which is metadata-driven. As the Hadoop ecosystem continues to evolve, we believe that implementing a metadata-driven approach for Big Data initiatives will result in more value, credibility and ultimately the better understanding of data relationships in your enterprise.

In the next part of the series, we will take a detailed look at how “search” plays a key component in looking for data assets in the catalog.

Related Post:

Part 1: The Next Generation of Metadata Management