Metadata has the same challenges as data. It is created in silos, there is lots of variation and inconsistency, it is growing exponentially and it is of little value if not managed. For example:
- An entity relationship diagram in a data modeling tool is metadata about a database.
- An application portfolio in an EA repository is metadata about systems.
- Server information in a CMDB is metadata about IT assets.
- Mapping information in Power Center is metadata about data lineage.
- Data profiles on a data quality scorecard is metadata about the quality of information.
- An XML schema in a service registry is metadata about canonical message format.
- A business glossary in a metadata repository is metadata definition of business data.
- The status of a new BI report on a project dashboard is metadata about how data is changing.
The list goes on. To build on Tuesday’s about the Big Uptick in Customer Metadata Projects, all of these examples describe data in motion, data at rest, or the process of changing data.
Metadata is the equivalent of a card catalog for books. If you have just one bookshelf full of books, you probably don’t need an index; you can quickly scan the shelf to find the book you need. If you have a library with hundreds of shelves and thousands of books, then you need a card catalog. If your card catalog is manual (like a spreadsheet) it can tell you that the book was acquired, but it’s not on the shelf where it’s supposed to be – it could mean that it’s loaned out, misfiled, or stolen. A better approach is to have an electronic catalog integrated with a membership and administration system that keeps track of the whereabouts of each book. If the items we are inventorying are data rather than books, then we call it a metadata repository.
So does this mean that we should put all our metadata into one repository? The equivalent of an enterprise data warehouse for metadata? The short answer is NO. You need multiple repositories because the metamodel, editing tools and notation conventions, for each of the metadata domains is very different. If you try to put it in one tool, the metamodel and the user interface become so large that they will die under the weight of complexity.
The solution is to use a federated approach. Namely, where each type of metadata is maintained in a repository and tool that is tuned for a specific purpose. That said, you need a way to “connect the dots” between the various metadata repositories in order to address specific business needs. For example, if you plan to upgrade an application, you need to know what up-stream and down-stream information flows are impacted, what servers and databases are associated with the system, what enterprise data definitions will change and which software tools and OS dependencies it has. You need a strategy to link all the disparate sources of metadata to analyze the multiple repositories.
The solution is to establish a small number of common keys that can be used to join data from multiple sources. The most frequently used ones are Application ID, Server ID and Database ID. These serve as the equivalent of a General Ledger chart of accounts – a way to organize information in a consistent way.
The good news is that Informatica’s Metadata Manager and Business Glossary capabilities (part of PowerCenter Advanced Edition) is a great way to implement a federated metadata strategy, realize master metadata and solve real business problems.
- Metadata Manager for IT users to visualize, document and manage change in their environment.
- Business Glossary for business users to create and manage business metadata – adding business context to the organization.
- Extensible metamodel and custom X-connects for administrators to exchange metadata with virtually any other repository.
Ultimately, when master metadata is combined with the Informatica Platform, it becomes a key ingredient for “one version of the truth” and for turning data into an asset for competitive advantage.