Tag Archives: Metadata Management
I’m glad you enjoyed my last letter explaining what data is and how people in my industry make a living managing it. After that letter, you confidently answered all data-related questions your knitting-circle friends could throw at you. But then Edward Snowden, former NSA contractor and world-renowned whistle-blower, came on the scene. Suddenly mainstream news anchors are talking about metadata.
I got your panicked voicemail and, as promised, I’m going to try to clarify what metadata is and how it relates to data. (more…)
A number of customers have asked me recently about the benefits of using a business glossary product over using a spreadsheet or Sharepoint. The discussion is worth sharing.
If you have a smaller company and all you need is a list of standard business terms to provide a common business vocabulary across the company, a spreadsheet or Sharepoint can work, …up to a point. The problem is that once your organization reaches a certain size, you are going to have trouble scaling the management of the business terms, making them available across a larger organization, and fostering collaboration based on the agree-upon business terms. (more…)
It’s important to note that I didn’t title this post “Implementing a Data Governance Architecture”. Data governance is not a technology space, tool – or architecture. As our data governance framework illustrates, tools and architecture represents but one of many facets needed to support an enterprise data governance competency. But once you’ve defined your vision and business case with a clear approach for managing the people, process and policy facets, technology can play a significant role in determining the ultimate success or failure of your data governance efforts. Complex and poorly integrated current state architectures present a significant obstacle to applying common standards for the delivery of trusted and secure data across the enterprise. Data architects play a pivotal role in enabling data governance by designing and evangelizing the data management reference architecture to support data quality and privacy requirements. In addition, these architects must recommend enabling technologies to support data governance and stewardship workflows that aid the core processes of discovery, definition, application and measurement and monitoring (Stay tuned – I’ll be sharing a lot more about these core data governance processes in a future post discussing the “Defined Processes” facet of our framework). Whatever you do, don’t fall into the all-too-common IT trap of selecting the tools before the goals, strategy and processes of data governance are in place. If you skip these steps and just try to build it, they (‘the business’) most assuredly will NOT come. (more…)
Any personal opinions on the health care mandate being irrelevant; I can’t help but be amused by the liberties taken by both major political parties on the definition of a “tax.” When Chief Justice Roberts’ gave the majority opinion that the individual health insurance mandate was constitutional under Congress’ power to tax, the political spin doctors went into overdrive. Everyone on both sides is simultaneously agreeing it is and is not a tax in order to promote their agendas – and has managed to confuse the heck out of the American public in the process. (This ABC News story prompted me to write about this).
I bring this up here because this national debate on the constitutionality of “Obamacare” and the definition of what constitutes a tax is no different from many of the politically-charged debates occurring within your organizations with passions running equally high and confusion reigning supreme. (more…)
Metadata has the same challenges as data. It is created in silos, there is lots of variation and inconsistency, it is growing exponentially and it is of little value if not managed. For example:
- An entity relationship diagram in a data modeling tool is metadata about a database.
- An application portfolio in an EA repository is metadata about systems.
- Server information in a CMDB is metadata about IT assets.
- Mapping information in Power Center is metadata about data lineage.
- Data profiles on a data quality scorecard is metadata about the quality of information.
- An XML schema in a service registry is metadata about canonical message format.
- A business glossary in a metadata repository is metadata definition of business data.
- The status of a new BI report on a project dashboard is metadata about how data is changing. (more…)
Consider this situation: Would you try to ride a bicycle blindfolded? You could probably pump the pedals and steer without trouble, but you would be lacking the visual feedback that the changes you are making in direction and velocity will keep you on your intended course and avoid harm.
This question undoubtedly sounds crazy, but people are making changes to their data integration environments every day without the tools in place to visualize the environment and to tell them the impact of proposed changes.
There are good tools available today to help with this problem.
As I discussed in my last posting, ELT or pushdown optimization can significantly improve data warehousing performance, while reducing costs. I also mentioned it’s important to implement a data integration platform that supports both traditional ETL (Extract-Transform-Load) and ELT (Extract-Load-Transform) methods, because different situations call for different methods.
Taking this thought a step further, metadata is the critical binding agent that should cut across all data integration approaches, be they ELT or ETL or some other combination such as ETLT. If the actual transformation logic and business rules are defined as metadata, the choice of where the processing actually occurs, be it the ETL server or the database/data warehouse, becomes a matter of configuration rather than of coding. A truly metadata-driven data integration platform enables you to design and reuse the same transformation rules, regardless of whether you choose ELT or ETL for data warehousing. (more…)
Over the past several quarters, I’ve had the privilege of speaking with a number of companies involved in data governance. The interesting thing I found: firms who identified both drivers as critical, but only invest in one and not the other.
Case in point: a leading financial services firm implemented a data governance program to improve the comprehension and accuracy of the company’s existing board reports. I learned that one of their goals was to define their business terms and definitions (i.e. business metadata) to help non-technical users improve their understanding of the data used to run the business. What I found fascinating was that this was being done prior to addressing their data quality issues. In fact, when asked, “Do you have data quality challenges?” most business users said “yes”. Unfortunately, no one at this company knew to what extent. Instead, their focus was on defining their business metadata. This leads me to ask, “Can you trust your metadata without addressing your data quality issues as part of a data governance practice?”
If metadata is information about your data which your business users are relying on to drive decisions, but the source data is not clean, how will that affect your business? The answers seem self-explanatory. Of course you can’t trust your metadata if you have poor quality data. For example, business metadata is defined from an approved list of valid values. Unfortunately, if the data used to define those values are incorrect, the downstream impact is you end up with inaccurate metadata.
Organizations implementing data governance programs need to consider the lifecycle of how data is captured, processed, and delivered to downstream systems— whether that is your data warehouse, master data management application, data hub or CRM system. Creating, defining, and publishing business metadata without addressing your data quality issues may not help companies looking to benefit from data governance.