Data Integration - Informatica

Informatica Data Quality

National Security vs. Privacy Rights - the Role for Technology

Ivan Chong

I ran across an interesting article concerning the US initiative to broker data exchange with various EU nations. The intent is to gain greater access to information that would help in the global war on terror.

European governments are entering into these agreements much more readily than they were four, five years ago, because concerns about terrorism are no longer confined to one side of the Atlantic.

The article then highlights the concerns over violation of personal privacy rights and the potential for abuse.

The agreement, which was described by two European officials, also allows for the transmission of "personal data revealing racial or ethnic origin, political opinion or religious or other beliefs, trade union membership or information concerning health and sexual life" in cases where they are "particularly relevant to the purposes of this agreement." It defines personal data as "any information relating to an identified or identifiable natural person."

The technology challenge can often be so consuming that we devote scarce attention to the ethical issues involved. Data integration and identity resolution technology are continually advancing. By factoring in ethical and moral considerations into the development of the technology, we should be able to support both objectives. Privacy and security do not necessarily need to be requirements that trade off against each other. In terms of identity resolution, the technology easily supports masking of personal attributes. Match results can be delivered independent of the conditions which trigger the match. Personal data used for matching can be stored in a transient manner and safeguarded against open access. etc. etc. I'm sure we can debate the efficacy of the technology towards these objectives. But at least, we should include technology in the debate.

You can't have CDI without Data Quality

Tom Golden

Looking in Webopedia.com recently I came across a definition for CDI. Yes webopedia.com - it bills itself as the #1 online encyclopedia dedicated to computer technology. You might wonder what I was doing surfing this font of knowledge - well I had time on my hands between delayed flights coming back to Europe from the US. You know what they say "time to spare, travel by air."

The Webopedia.com CDI definition went: "Short for Customer Data Integration, it is the combination of the technology, processes, and services needed to create and maintain an accurate, timely and complete view of the customer across multiple channels, business lines, and, potentially, enterprises, where there are multiple sources of customer data in multiple application systems and databases."

A bit long winded perhaps, but the three words that shone out at me through the glare of the florescent lights in San Francisco airport were "accurate, timely and complete"; all data quality issues. Despite this, few if any of the Customer Data Integration (CDI) vendors in the market today have truly addressed the data quality issues in their CDI solutions. And anyone who has gone down the route of developing their own custom-built CDI application will be all too familiar with the data quality demands involved.
[Read more]

Business and IT Collaboration is Essential for Data Quality

Ivan Chong

A recent InformationWeek article* described the growth in IT employment across the US as a result of a shift in skills. Rather than focusing on pure IT proficiency, organizations are looking for talent with “a more hybrid mix of technology skills, along with an understanding of the business and its customers.”

IT departments are highly motivated to increase the level of collaboration with their counterparts in the business. Nowhere is this more critical than in the area of data quality and the trend is causing a shift in the way companies are looking to solve their data quality issues. First generation data quality tools had a natural focus on technology, instead of business. Here are some of the differences between technology focused data quality solutions and business-focused data quality solutions.

Tools vs. Process
Technology focused data quality solutions provide tools that automate data processing. Evidence of this type of focus can be seen in the way that vendors will tout the sophistication and type of their algorithms over and above their ability to support ongoing data quality management processes. While technology is extremely important, its relevance cannot eclipse the overall data quality management process. Even if your data quality tool can automate the correction of 95 percent of the data, if the remaining five percent cannot be managed properly, you will continue to suffer from poor data quality.
[Read more]

If all master data was like customer data . . .

Garry Moroney

Managing the quality of customer data has its challenges: It is typically collected from a wide range of sources and channels and very often those responsible for entering or capturing data have no incentive to do so accurately. Even if they do much of it, including contact data can go out-of-date rapidly. Despite this most customer data has one major advantage over many other types of data which is agreed and accepted standards and reference data. While these standards do vary from country to country, they are at least universally understood and have an enormous impact on the approach and the effort required for managing data quality.

Because of these global standards and references, there is general agreement on what a complete, valid and correctly formatted address should look like - likewise person or business name, telephone number, date of birth, email address etc. So this means that if I am sharing my customer data with my business partners at least we have a common view of what high quality data should look like and the checks we need to make to assess the quality levels.

Another huge benefit is that third party service providers and technology vendors also understand the requirements and standards by which to measure and improve customer data and they know that these requirements are largely the same for all vendors. As a result large numbers of service bureaux and technology vendors are able to offer well developed, generic, out-of-the-box products and services to tackle customer data quality issues. These can deliver a lot of value with minimal or no customization and the effort to acquire and implement these solutions is small.
[Read more]

Matching: Determinism and Probability in a new context

Chris McCauley

I wanted to follow up on my comments about technical versus business metadata and had intended to make this article about metadata, but instead I'm going to kill two birds with one stone and talk about approaches to matching. Why? Well recently I found myself getting dragged into a very old argument about Probabilistic versus Deterministic matching. Normally I'm very happy to get into any techie argument and the more pointless the better, but this one has been done to death so I feigned ignorance and escaped with a new idea for a blog entry.

The argument was about whether a deterministic or probabilistic approach to matching is better. After six years in data quality technology, I'm happy enough to argue for either approach and given a strict choice between "A" and "B" can make a convincing case for option "C."
[Read more]

Information Quality and Accountability

Larry English

All he wanted for Christmas was anything but what he got. Jeffrey Skilling, former Enron CEO moved to his new residence at the Federal Correctional Institution in Waseca, Minnesota, where his sentence calls for him to live for the next 24 years for his role in fraud, conspiracy, insider trading and other crimes leading to the collapse of Enron. These crimes led to the loss of thousands of jobs, more than $60 billion in company stock and more than $2 billion in employee pension plans.

But Mr. Skilling will have a new job as well. He will probably work as a food service helper, painter or plumber. While this is not the cush job he had as CEO at Enron where he earned $151.7 million over the three years during the time he perpetuated his fraud, he will get from 14 to 40 cents per hour. At the top pay, Skilling could earn $832 per year. At that rate it would take 74.5 million years to pay back the stock and pension losses he foisted on the stakeholders.

So what is the point here? [Read more]

,