Tag Archives: Metadata
I’m glad you enjoyed my last letter explaining what data is and how people in my industry make a living managing it. After that letter, you confidently answered all data-related questions your knitting-circle friends could throw at you. But then Edward Snowden, former NSA contractor and world-renowned whistle-blower, came on the scene. Suddenly mainstream news anchors are talking about metadata.
I got your panicked voicemail and, as promised, I’m going to try to clarify what metadata is and how it relates to data. (more…)
In many ways, a data warehouse resembles the children’s game broken telephone, where a message is distributed across a group by being whispered into the ear of one player after another, until the last player announces the message to the entire group. Since errors typically accumulate in the retellings, the final version of the message often differs significantly from its source. Some players appear to deliberately alter what’s being said, guaranteeing a garbled message by the end of the game.
As data journeys from operational sources through the staging area, the data warehouse, the data marts, and finally into dashboards and reports, a lot could be lost in translation. As it is processed, data is often deliberately altered to make it accommodate the structure of its next target. Every time data moves from point to point, there is a possibility for semantic inconsistencies to be introduced. (more…)
Short commentary on the fact that in the end, John McAfee was not caught through superior sleuthing by the Guatemalan Police but by a hacker who took him down thanks to metadata.
So the rest of you don’t have to read the details, what happened was that McAfee, yes, that McAfee of antivirus software fame, fled his home in Belize after his next door neighbor had been found murdered. He fled Belize and had been hiding out, blogging and tweeting along the way which I am sure the authorities found quite amusing. In the end, he was caught because he gave an interview to a reporter who took a picture of him with his iPhone. The picture was posted on the Internet and some hacker got a hold of the photo file which had the GPS coordinates of the location of the phone embedded in it. (more…)
Big data and related technologies such as Hadoop present significant opportunities and challenges to businesses. Nearly everybody in IT reports that they are actively evaluating big data technologies. And, just as you would expect, they are in a variety of stages of implementation. So, who has time to think about data governance when dealing with a massive change like this?
First, you have to get your hands around the new technology, right? Actually, this is exactly the right time to think about data governance for big data; before the wild, untamed data from outside the company starts getting mixed with your potentially more trustworthy, tamed, internal data. (more…)
Does your organization have a structured repository of metadata that can help a data center operator (whether they are on-site or off-shore) quickly troubleshoot a production incident related to a data integration job at 2:00 am in the morning? Or any time of day for that matter? This is just one use of metadata. A new Metadata Management whitepaper has just been published which describes the wide range of metadata types, uses and the business value derived from them. (more…)
A number of customers have asked me recently about the benefits of using a business glossary product over using a spreadsheet or Sharepoint. The discussion is worth sharing.
If you have a smaller company and all you need is a list of standard business terms to provide a common business vocabulary across the company, a spreadsheet or Sharepoint can work, …up to a point. The problem is that once your organization reaches a certain size, you are going to have trouble scaling the management of the business terms, making them available across a larger organization, and fostering collaboration based on the agree-upon business terms. (more…)
Data federation techniques help mitigate both the accessibility and the latency issues, but we still need to deal with the need for quality of content when employing a data virtualization approach. Within the ETL world, data inconsistencies and inaccuracies are dealt with through a separate data profiling and data quality phase. This works nicely when the data has already been dumped into a separate staging area. But with federation, not all the data is situated within a segregated staging area. Loosely-coupled integration with data quality tools may address some of the problem, but loosely-coupling data quality can’t eliminate those situations in which the inconsistencies are not immediately visible. (more…)
In this video, Richard Cramer, chief healthcare strategist, and Claudia Chandra, senior director, product management, ILM, Informatica, discuss healthcare and application retirement.
During this discussion (the second of two videos), Richard and Claudia cover the following topics as they relate to healthcare:
- Application retirement project scope
- Application retirement enterprise IT initiatives
- How application retirement is the fastest way for IT to save money now
The first video, which discusses the business case for application retirement and additional drivers for application retirement, can be found here: http://www.youtube.com/watch?v=i7VHKY2tDlQ.
The conventional wisdom around master data management suggests that there are a few “core” master data domains that must be handled by the master data systems, namely customer and product. Of course there are a few other key ones as well, along with some critical master reference domains. But underlying the suggestion that master data management (MDM) systems must provide a model for customer and a model for product imply some degree of simplicity, both in terms of consideration of the potential conflicting definitions that can be applied to either of these two concepts, as well as the ability to manage the mapping between what should be the unified representation of those concepts and their actualized representations as they appear within existing application systems. (more…)
I have harped about one specific aspect of data sharing and uncontrolled data repurposing: when the data consumer is not part of the requirements analysis and design processes, that consumer’s perception of data semantics must be based on his/her own reinterpretation. For example, if I were to access a data set, say from data.gov, that had a list of failed banks, I am fed a csv file with the following columns headers:
|Bank Name||City||State||CERT #||Acquiring Institution||Closing Date||Updated Date|