In many ways, a data warehouse resembles the children’s game broken telephone, where a message is distributed across a group by being whispered into the ear of one player after another, until the last player announces the message to the entire group. Since errors typically accumulate in the retellings, the final version of the message often differs significantly from its source. Some players appear to deliberately alter what’s being said, guaranteeing a garbled message by the end of the game.
As data journeys from operational sources through the staging area, the data warehouse, the data marts, and finally into dashboards and reports, a lot could be lost in translation. As it is processed, data is often deliberately altered to make it accommodate the structure of its next target. Every time data moves from point to point, there is a possibility for semantic inconsistencies to be introduced.
Much more than “data about data,” metadata can be thought of as a translator providing a definition and context for data. Therefore, metadata plays an integral role in determining data usage. There’s also a strong relationship among metadata, data quality, and business intelligence. Metadata provides a context for evaluating the quality of data, and metadata provides a framing effect for interpreting the contents of the dashboards and reports involved in the decision-making process.
Commonly used terms like revenue and customer often complicate what on the surface seem like straightforward discussions, such as how much revenue was generated during a particular fiscal reporting period, or how many customers the organization has. These discussions often turn into heated debates over how terms like revenue and customer should be defined, and how their data should be integrated and aggregated to support the views displayed in dashboards and reports.
Communication breakdowns prevent business problems and related data challenges from being well-understood by the data warehouse team trying to solve them. How does your organization fix the broken telephone?
You can listen to my conversation about overcoming metadata challenges in data warehousing with Reuben Vandeventer from CNO Insurance and Sean Crowley from Informatica.
Blogger-in-Chief, Obsessive Compulsive Data Quality
Jim Harris is a recognized thought leader with over 20 years of enterprise data management experience, specializing in data quality and data governance. As Blogger-in-Chief at Obsessive Compulsive Data Quality, Jim offers an independent, vendor-neutral perspective, and hosts the popular audio podcast OCDQ Radio, syndicated on iTunes and Stitcher SmartRadio. Jim is an independent consultant and freelance writer for hire, as well as a regular contributor to Information-Management.com and DataRoundtable.com.