blogs.informatica.com
informatica.com my.informatica.com Developer Network Worldwide Sites
Informatica: The Data Integration Company

HomeHome

Technology Archives

February 01, 2008

You can’t have CDI without Data Quality

Posted by Tom Golden in: Data Quality > Benefits ; Data Quality > Best Practices ; Data Quality ; Data Quality > Technology

Tom Golden
Looking in Webopedia.com recently I came across a definition for CDI. Yes webopedia.com - it bills itself as the #1 online encyclopaedia dedicated to computer technology. You might wonder what I was doing surfing this font of knowledge – well I had time on my hands between delayed flights coming back to Europe from the US. You know what they say “time to spare, travel by air.”

The Webopedia.com CDI definition went: “Short for Customer Data Integration, it is the combination of the technology, processes, and services needed to create and maintain an accurate, timely and complete view of the customer across multiple channels, business lines, and, potentially, enterprises, where there are multiple sources of customer data in multiple application systems and databases.”

A bit long winded perhaps, but the three words that shone out at me through the glare of the florescent lights in San Francisco airport were “accurate, timely and complete”; all data quality issues. Despite this, few if any of the Customer Data Integration (CDI) vendors in the market today have truly addressed the data quality issues in their CDI solutions. And anyone who has gone down the route of developing their own custom-built CDI application will be all too familiar with the data quality demands involved.

Continue reading "You can’t have CDI without Data Quality" »

December 05, 2007

Business and IT Collaboration is Essential for Data Quality

Posted by Ivan Chong in: Data Quality > Best Practices ; Data Quality ; Data Quality > Management ; Data Quality > Technology

Ivan Chong
A recent InformationWeek article* described the growth in IT employment across the US as a result of a shift in skills. Rather than focusing on pure IT proficiency, organizations are looking for talent with “a more hybrid mix of technology skills, along with an understanding of the business and its customers.”

IT departments are highly motivated to increase the level of collaboration with their counterparts in the business. Nowhere is this more critical than in the area of data quality and the trend is causing a shift in the way companies are looking to solve their data quality issues. First generation data quality tools had a natural focus on technology, instead of business. Here are some of the differences between technology focused data quality solutions and business-focused data quality solutions.

Tools vs. Process
Technology focused data quality solutions provide tools that automate data processing. Evidence of this type of focus can be seen in the way that vendors will tout the sophistication and type of their algorithms over and above their ability to support ongoing data quality management processes. While technology is extremely important, its relevance cannot eclipse the overall data quality management process. Even if your data quality tool can automate the correction of 95 percent of the data, if the remaining five percent cannot be managed properly, you will continue to suffer from poor data quality.

Continue reading "Business and IT Collaboration is Essential for Data Quality" »

February 19, 2007

If all master data was like customer data . . .

Posted by Garry Moroney in: Data Quality ; Data Quality > Management ; Data Quality > Technology ; Data Quality > Vertical Solutions

Garry Moroney
Managing the quality of customer data has its challenges: It is typically collected from a wide range of sources and channels and very often those responsible for entering or capturing data have no incentive to do so accurately. Even if they do much of it, including contact data can go out-of-date rapidly. Despite this most customer data has one major advantage over many other types of data which is agreed and accepted standards and reference data. While these standards do vary from country to country, they are at least universally understood and have an enormous impact on the approach and the effort required for managing data quality.

Because of these global standards and references, there is general agreement on what a complete, valid and correctly formatted address should look like - likewise person or business name, telephone number, date of birth, email address etc. So this means that if I am sharing my customer data with my business partners at least we have a common view of what high quality data should look like and the checks we need to make to assess the quality levels.

Another huge benefit is that third party service providers and technology vendors also understand the requirements and standards by which to measure and improve customer data and they know that these requirements are largely the same for all vendors. As a result large numbers of service bureaux and technology vendors are able to offer well developed, generic, out-of-the-box products and services to tackle customer data quality issues. These can deliver a lot of value with minimal or no customization and the effort to acquire and implement these solutions is small.

Continue reading "If all master data was like customer data . . ." »

Data Quality Metadata; a lot more than just "data about data"

Posted by Chris McCauley in: Data Quality > Best Practices ; Data Quality ; Data Quality > Technology ; Data Quality > Vertical Solutions

Chris McCauley
On reflection, dipping into details of matching technologies in my last blog entry wasn't that much of a detour from the subject of metadata. It broached the idea that one technology was better than another because of its ability to better handle the context in which it was used. There are a number of themes that run through my work on data quality at Informatica and one of them is "metadata as context".

By way of explanation, let's pretend that you are the newly hired Head of Engineering in a software company bringing a product to market (maybe a new operating system). The quality assurance team has completed its testing and you've just been told that it found no "show-stopper" defects, some usability "gotchas" and a smattering of documentation problems. The pressure is on; Sales has been promising these features to customers for weeks and Marketing announced this stuff months ago: to ship or not to ship, that is the question.

The QA stats are on the surface very reassuring, but we need to know some more about how the team arrived at those numbers before we start pressing CDs. If you found out that the team had been added late in a very long development process and was unfamiliar with the product would you still be sitting comfortably?

You can see how the words "context" and "metadata" could be used interchangeably when thinking about the scenario above. Harking back to the discussion about Probabilistic matching systems, there's value in understanding the context in which you are operating. Metadata can be used to capture some aspects of context hence "metadata as context".

Continue reading "Data Quality Metadata; a lot more than just "data about data"" »

January 31, 2007

Matching: Determinism and Probability in a new context

Posted by Chris McCauley in: Data Quality ; Data Quality > Technology

Chris McCauley

I wanted to follow up on my comments about technical versus business metadata and had intended to make this article about metadata, but instead I'm going to kill two birds with one stone and talk about approaches to matching. Why? Well recently I found myself getting dragged into a very old argument about Probabilistic versus Deterministic matching. Normally I'm very happy to get into any techie argument and the more pointless the better, but this one has been done to death so I feigned ignorance and escaped with a new idea for a blog entry.

The argument was about whether a deterministic or probabilistic approach to matching is better. After six years in data quality technology, I'm happy enough to argue for either approach and given a strict choice between "A" and "B" can make a convincing case for option "C."

Continue reading "Matching: Determinism and Probability in a new context" »

December 19, 2006

Information Quality and Accountability

Posted by Larry English in: Data Quality ; Data Quality > Governance / Stewardship ; Data Quality > Management ; Data Quality > Technology

Larry English
All he wanted for Christmas was anything but what he got. Jeffrey Skilling, former Enron CEO moved to his new residence at the Federal Correctional Institution in Waseca, Minnesota, where his sentence calls for him to live for the next 24 years for his role in fraud, conspiracy, insider trading and other crimes leading to the collapse of Enron. These crimes led to the loss of thousands of jobs, more than $60 billion in company stock and more than $2 billion in employee pension plans.

But Mr. Skilling will have a new job as well. He will probably work as a food service helper, painter or plumber. While this is not the cush job he had as CEO at Enron where he earned $151.7 million over the three years during the time he perpetuated his fraud, he will get from 14 to 40 cents per hour. At the top pay, Skilling could earn $832 per year. At that rate it would take 74.5 million years to pay back the stock and pension losses he foisted on the stakeholders.

So what is the point here?

Continue reading "Information Quality and Accountability" »