Reminder: This blog has moved to http://blogs.informatica.com/perspectives
Posted in Uncategorized by Informatica | 1 CommentWe will be shutting down this feed and blog, so please be sure to visit and bookmark the new blog.
We will be shutting down this feed and blog, so please be sure to visit and bookmark the new blog.
Informatica has launched the Informatica Perspectives blog (RSS) where you can now find the latest Data Quality discussions among other topics. Please update your RSS subscription to track the following RSS feed for the latest blog posts on Data Quality.
Thanks,
The Informatica Team
![]() |
I ran across an interesting article concerning the US initiative to broker data exchange with various EU nations. The intent is to gain greater access to information that would help in the global war on terror.
European governments are entering into these agreements much more readily than they were four, five years ago, because concerns about terrorism are no longer confined to one side of the Atlantic.
The article then highlights the concerns over violation of personal privacy rights and the potential for abuse.
The agreement, which was described by two European officials, also allows for the transmission of "personal data revealing racial or ethnic origin, political opinion or religious or other beliefs, trade union membership or information concerning health and sexual life" in cases where they are "particularly relevant to the purposes of this agreement." It defines personal data as "any information relating to an identified or identifiable natural person."
The technology challenge can often be so consuming that we devote scarce attention to the ethical issues involved. Data integration and identity resolution technology are continually advancing. By factoring in ethical and moral considerations into the development of the technology, we should be able to support both objectives. Privacy and security do not necessarily need to be requirements that trade off against each other. In terms of identity resolution, the technology easily supports masking of personal attributes. Match results can be delivered independent of the conditions which trigger the match. Personal data used for matching can be stored in a transient manner and safeguarded against open access. etc. etc. I'm sure we can debate the efficacy of the technology towards these objectives. But at least, we should include technology in the debate.
![]() |
Just gave a presentation at MIT's Information Quality conference hosted at the Sloan school of management. Data Governance largely deals with softer topics like people, organizational strategies, and processes. Not necessarily technology. The irony was not lost on anyone that this presentation given at MIT stressed that technology alone would not solve a company's data quality problems.
It was a real privilege and honor for me to return as a lecturer to some of the same classrooms I attended as a student. MIT's Sloan school is right next to the Media Lab where I did undergraduate research some twenty years ago. The most profound takeaway from my time as an engineering student was the notion that technology alone could not solve hard problems. Back in 1986, we were experimenting with sending images and video over the network and the prof's were always stressing that social and organizational considerations factored heavily into technology adoption. This may sound obvious to grizzled IT veterans, but to the wide-eyed geeks studying at MIT, this came as quite a revelation. Certainly, this is the underlying driver behind Data Governance - it's a necessary framework so the enterprise can leverage and apply data quality, data integration, and metadata management technology.
The presentation covered several case studies involving successful customer deployments of enterprise-wide data governance programs. Many of the attendees commented that they found it necessary to gain initial wins on tactical projects so they could gain credibility and navigate the political issues behind an enterprise deployment. There was certainly some really vigorous discussion and debate on this topic.
What experience have you had with implementing a data governance program? Just like these MIT students, feel free to share your opinions with us.
![]() |
It's been rumored for a while, but now it is official - Microsoft has announced an agreement to buy a data quality startup company, Zoomix, for the purpose of enhancing SQL Server.
Microsoft plans to add Zoomix's technology to future releases of its SQL Server database, the company said through its public relations firm. Zoomix said its development team will join the SQL Server team at Microsoft's research and development center in Israel.
While this is not a large transaction for Microsoft, the move does underscore the importance of Data Quality. However, this raises an interesting question. Who should you trust to deliver data quality? The people who brought you Vista? the folks who sold you SAP? At first glance, it seems quite convenient to be able to deal with data quality issues in conjunction with specific source systems. However, many IT experts would claim this approach is merely a stop-gap measure. Data must be managed apart from its host systems. Data Quality rules start to truly add value to the business when they span MS SQL Server, and SAP, and Oracle, and etc. etc. It's still a topic of debate. But the discussion has moved beyond the question of "is data quality software useful?" to "where is the most useful place to deliver data quality software?"
Feel free to post your opinions!
![]() |
If you ever find yourself discussing the benefits of data quality for your business and one of your associates asks rhetorically, "Yes, but can it solve world hunger?" you now have an answer for them.
![]() |
The Food and Agriculture Organization of the United Nations records the level of completeness for data collection from each member nation. On their website, their stated mission is to work towards "a world without hunger." A key element in their fight against hunger is the FAO Stat database and a key means of maintaining the efficacy of the data is their data quality dashboard.
For organizations working with the FAO, it's important that the data be accurate - otherwise perishable goods may be wasted by getting shipped to locations not suffering from malnourished populations. This example highlights something that I've seen very often in the context of enterprise data quality initiatives. Many prospective customers come to us and ask "how do we get started, given the complexities of coordinating across multiple organizations inside our company?" Within the Informatica customer base, there are many examples of successful initiatives starting off with Data Quality metrics and dashboards. The metrics offer a great way for organizations to maintain a dialog on how to prioritize their investment in data quality.
Already, I've received email comments on my posting. "Can Data Quality allow us to live longer? Facilitate the exploration of outer space?" Great questions… stayed tuned for future postings!
![]() |
"Information Presentation Quality Characteristics"
This blog is the third and last of a series of blogs on the critical-to-quality characteristics of information quality required to achieve Total Information Quality Management. For information to have quality to knowledge workers:
The last set of quality characteristics that knowledge workers require is presentation quality characteristics, which we discuss here.
It is a fatal mistake to measure only the quality of the data content to determine Information Quality. Many process and decision failures result from poor quality presentation of the information.
Presentation quality is part of the human-machine interface. Presentation quality characteristics represent the "look and feel" of the finished information product. These characteristics are not just the prettiness or flashiness of information presented, but represents the degree to which the information communicates the message in the data accurately and clearly to the information consumer so they can perform their work effectively.
Information Presentation Quality Characteristics:
The major information presentation (delivery or communication to information consumers) quality characteristics include:
A.1.1 Quality Characteristics of Information Presentation
Knowledge workers require different content quality characteristics based on their need for that information. Based on my work with dozens of clients, the major information presentation quality characteristics include:
Documents should use readability-enhancing techniques such as:
Methods such as "Information Mapping" help improve readability of documents.
Presentation Clarity. Information is presented in a way that communicates the truth of the information. Clear labels, footnotes, other explanatory notes, references, or links to definitions and/or documentation that clearly communicate the meaning and any anomalies in the information enhance presentation clarity
Changes in data definition or in business rule specification can cause comparing information across time boundaries to be not accurate
Signage Clarity. Signs and other information-bearing mechanisms like traffic signals should be standardized and made universal across the broadest audience possible
Traffic signal lights are now standardized globally with red (stop), yellow (caution), and green (go) meanings. Furthermore, traffic signal lights have standard placements with red on top and green at the bottom for people with color-blindness, so that meaning is consistently associated with the position. The "redundancy" in this message system reduces error in those affected by color-blindness
Presentation Objectivity. Information is presented without bias, enabling the knowledge worker to understand the meaning and significance without misinterpretation
Numeric or quantitative data often requires graphical presentation. Objectivity means that the graphical or visual presentation of the information does NOT distort the truth as evidenced in the data
Presentation Utility. Information is presented in a way that is intuitive and appropriate for the task at hand. The presentation of information will vary by the individual uses for which it is required. Some uses require concise presentation, while others require a complete, detailed presentation, and yet others require graphics, color-coding, or other highlighting techniques
For more about Information Presentation Quality, see Chapter 6, "Assessing Information Quality," in Improving Data Warehouse and Information Quality. This contains a more comprehensive list of quality characteristics with examples. It also describes how to measure these quality characteristics.
What do you think? Share your experiences in measuring or improving information presentation quality.
![]() |
Information Content Quality Characteristics Larry English
One of the root causes of poor quality information is defects in the data definition, specifically the "information product specifications." Because information is a product of our business, manufacturing and service processes, the analogy of an "information product" is real, and the requirement for quality in "information product specifications" is a critical requirement for Information Quality.
This blog is the second of a series of three blogs on the critical quality characteristics (or measures) of information quality required on the TIQM Quality System.
Information Content Quality Characteristics
Information Content Quality Characteristics: The major information
content (data values) quality characteristics
include:
For more about Information Content Quality, see Chapter 6, "Assessing
Information Quality," in Improving Data Warehouse and Information Quality.
This contains a more comprehensive list of quality characteristics with examples.
It also describes how to measure these quality characteristics. The next blog
will discuss information presentation quality characteristics required for the
finished Information Product presented to the knowledge workers.
What do you think? Share your experiences in measuring information content
quality, especially accuracy.
![]() |
Gartner released the 2008 Magic Quadrant for Data Quality Tools on Wednesday, June 4th (Gartner, Inc. "Magic Quadrant for Data Quality Tools" by Ted Friedman, Andreas Bitterer June 4, 2008). In reading through the report, I was excited to see that Informatica was positioned in the Leaders Quadrant. If you haven't read the full report, I recommend heading to Informatica.com and requesting a complimentary copy, as it provides significant insight into many of the vendors in the space.
![]() |
Gartner just released this year's report on their Data Quality Tools Magic Quadrant, ranking Informatica right in the mix among vendors listed in the leaders quadrant. These vendors have been recognized leaders for many years. Some may ask "how is it that Informatica has grown so fast to be recognized as a leader in data quality only two years after entering the market?" Gartner cites many of the reasons, including significant adoption, strong data profiling, domain-agnostic data cleansing, ease of use and positive support and services experiences.
Another reason that has often been discussed by industry analysts is the convergence of the data integration and data quality markets. Customers benefit from a tremendous amount of synergy between data integration and data quality. Anyone doing a data migration, data consolidation, data warehouse, or MDM project would consider that project a complete failure if the data is not accurate and consistent. In order to deliver a compelling data quality product, it must be built on top of a comprehensive data integration platform. Customers can then achieve the levels of scalability, volume processing speed, real-time responsiveness, and near-universal connectivity that the best data integration products provide.
There may be a cluster of vendors in the leadership quadrant for the Data Quality MQ, but in the Gartner Data Integration Magic Quadrant, the choices are much clearer. When making strategic buying decisions, customers can simply look at the intersection of those two reports and quickly discovering which vendors offer the best products.
Next,
Stay updated on my meandering thoughts & activities via RSS (Syndicate).