Data Integration - Informatica

Informatica Data Quality

Better management through measuring data quality

Ivan Chong

I recently asked a customer of ours why they invested so much in monitoring and publishing key performance indicators for their data quality. “Believe it or not, the biggest reason we measure data quality is not to correct bad data” came the reply. “The reason we monitor data quality is to detect problems with our business processes.”

Indeed, as I mentioned in my last blog post, business users look to investments in people and processes in addition to technology in order to address poor data quality. For example, if a bank branch manager received a report showing that customer data originating from his branch office had much higher incidents of duplicate entries and was putting the entire bank at risk of massive regulatory fines, he is not going to throw technology at the problem. His response might be mandatory training for tellers or better hiring practices to screen for adequate computer skills.

Experts in quality control methodology refer to this as addressing “root cause.” Common starting points of measurement involve completeness, accuracy, consistency, conformity, duplication, and integrity. Eventually, as the business culture matures its data quality practices, timeliness and data lineage (origination) are used to evaluate quality of data. Of course, software technology that automates the process of parsing, standardizing, matching and consolidating data is of immense value and is an absolute requirement in any data integration project. However, the issue of data quality goes beyond these IT projects. Ongoing measurement and monitoring of data quality provides value directly to the business because it helps them to better manage their people and processes.

The self-service data quality minefield

Tom Golden

Marshall Field, the American department-store owner and retail merchandizing pioneer, is usually credited with coming up with the now much abused adage “Right or wrong, the customer is always right”. These days the saying is usually shortened to the more functional “the customer is always right”. But with self-service data entry, where more and more customers are responsible for supplying their own contact and order details via web applications, it is worth revisiting the entire quote.

The fact is even when the customer is responsible for “bad data” the vendor still has to shoulder the blame and do something about it.

Take the example of a friend of mine who recently signed up with a well known social networking website. Let’s ignore the reasons why he signed up – he’s not entirely sure – or what social networking sites are for – none of us is entirely sure yet, but I guess we’ll start to get the point as more and more of them are sold to industry giants for n billion dollars.

[Read more]

Is there life on Mars?

Chris McCauley

This week NASA announced that it may have discovered evidence of water flowing on the surface of Mars in the recent past. This raises the possibility of life existing on Mars in the past and even to the present day.

A lot of talent, time and money has gone into addressing the fundamental question - is there life on Mars. In addition to the two rovers on the surface, there are currently three spacecraft in operation around Mars; Mars Odyssey, Mars Reconnaissance Orbiter and Mars Express – sadly the Mars Global Surveyor which is responsible for this weeks exciting news has probably suffered a severe failure and is in effect lost. Each spacecraft has been sent out to gather some basic statistics about the Red Planet such as; how the surface changes over time, the percentage of carbon-dioxide at the poles or how the temperature varies throughout the atmosphere. All of this in the hope that we move closer to a definitive answer to that fundamental question. But forget the answer: are we asking the right questions?

[Read more]

Start small with monitoring, but always think big to achieve data quality goals

Tom Golden

I attended my first parent-teacher meeting the other day for my five-year old daughter. Another one of those “life stage” events done and dusted – I remember dreading the annual meeting when I was a kid. The notion of my parents and my teacher comparing notes on my behaviour was too much to bear – somebody was eventually going to put two and two together and find out I was up to no good.

It all got me thinking about a recent blog post by my esteemed colleague Garry Moroney. His post Mobilizing the Data Quality Army outlined the level of effort, thought and planning that the US Department of Education is putting into data quality.

As Garry points out dealing with data quality in a large, disconnected organization such as the US schools system is not a trivial exercise. But if you were to only read that one post you might be overwhelmed by the potential size of the data quality task in front of you.
[Read more]

Business and IT Collaboration is Essential for Data Quality

Ivan Chong

A recent InformationWeek article* described the growth in IT employment across the US as a result of a shift in skills. Rather than focusing on pure IT proficiency, organizations are looking for talent with “a more hybrid mix of technology skills, along with an understanding of the business and its customers.”

IT departments are highly motivated to increase the level of collaboration with their counterparts in the business. Nowhere is this more critical than in the area of data quality and the trend is causing a shift in the way companies are looking to solve their data quality issues. First generation data quality tools had a natural focus on technology, instead of business. Here are some of the differences between technology focused data quality solutions and business-focused data quality solutions.

Tools vs. Process
Technology focused data quality solutions provide tools that automate data processing. Evidence of this type of focus can be seen in the way that vendors will tout the sophistication and type of their algorithms over and above their ability to support ongoing data quality management processes. While technology is extremely important, its relevance cannot eclipse the overall data quality management process. Even if your data quality tool can automate the correction of 95 percent of the data, if the remaining five percent cannot be managed properly, you will continue to suffer from poor data quality.
[Read more]

Building the Business Case for Data Quality

Chris Cingrani

As a new contributor to the data quality blog site, I wanted to start by introducing myself and highlighting the types of topics I plan to discuss on a semi-frequent basis. I am a Principal Consultant with Informatica Professional Services and have spent the past 6 years in the data quality space in a variety of sales and post sales roles. During this time I have seen the data quality market continue to evolve and mature. Thus, I would like to use this column to reflect on the types of use cases I have seen and continue to see when meeting with organization’s faced with data quality problems. I hope these posts can start an active dialogue, regardless if your company is trying to tackle their first data quality initiative or looking to build out a formal center of excellence around data quality.

To start, I wanted to pose a common question I am often asked by clients and prospects – how do I build a business case for data quality? Although an organization may think (or even know) there is a problem, the need to justify the cost around procuring a data quality solution often exists. This justification requirements often comes from the idea that data quality issues aren’t necessarily a core business issue (how wrong this is!) or something that can be handled through manual intervention (this is true – if you have unlimited time and money, but even then your results will be limited). Thus, the following points are meant to help start an organization down the path to building the internal business case through a Data Quality Audit. Note - if you have access to Informatica’s Velocity Methodology, I go into these steps in further detail in the best practice document, “Developing the Data Quality Business Case.”
[Read more]

Hospital Billing Errors 'Kill' Patients

Larry English

Hospital billing mistakes have become so prevalent that a niche industry has evolved to help patients decipher their bills and help correct the errors. ”Pat Palmer, founder of Medical Billing Advocates of America, estimates that she finds multiple errors in 8 out of every 10 hospital bills she reviews” (Dina ElBoghdady, “Killer Billing Errors,” Washington Post, June 27, 2004, p. F01. Accessed Aug 9, 2007 at: http://www.washingtonpost.com/wp-dyn/articles/A7351-2004Jun26.html ).

One patient saw the bill of $25,652.14 for her 2-hour routine operation, reduced to a cost of $17,000 billed to the insurance company and her cost reduced to $2,148, after correcting errors and overcharges.
While this error-rate was not statistically measured, nor would it apply to all hospital bills, it points out a huge problem in Information Quality in the health care system, not counting actual medical errors caused by information defects.

The consequence of health care over billing is significant. It contributes to the high costs of health care, which is one major cause of personal bankruptcy.
[Read more]

Information Quality & Management Transformation

Larry English

I recently received an email from one of my early clients. After having worked in four different companies in four different industries, she came to a sad conclusion, writing:

“The thing that they all have in common is a desire to cut corners and deal with quality later. It takes a lot of energy to be the information quality cheerleader, and I find it discouraging and overwhelming at times. Keep writing your articles and books to encourage all the people like me who are dealing with these issues every day.” P. G.

The discovery that P. G. has experienced is, unfortunately, the norm—not the exception. There are two critical elements in this experience.

[Read more]

Alice in “Qualityland"

Alice: Would you tell me, please, which way I ought to go from here?
The Cheshire Cat: That depends a good deal on where you want to get to
Alice: I don't much care where.
The Cheshire Cat: Then it doesn't much matter which way you go
– Lewis Carroll, Alice's Adventures in Wonderland

Chris McCauley

When confronted with the problem of how to address their data quality issues many organisations are faced with a similar dilemma to that which confronted Alice during her travels in Wonderland; “I know that I need to do something, but I don’t know where to start”. Knowing where to start and, equally importantly, the size of the problem as well as where an organisation needs to go are critical factors in ensuring that their data quality journey takes them where they need to be at the price they are prepared to pay.

When planning their “journey” organisations need to address the issue of data quality holistically by considering each of the three DQ pillars in turn; firstly “People”, then “Ideas” and finally “Technology”. Many DQ initiatives have failed as the primary focus has been on delivering a technical solution. However without the right framework in place and operated by the right people this approach will never deliver the results that organisations need. Time and time again within the IT industry it has been proved that the pure application of technology will never solve business issues, as technology in itself will never win the “war”, it is always the right people with the right ideas who use the technology in the right way.
[Read more]

Mobilizing the Data Quality Army

I’ve just been reading a US Department of Education briefing document on improving data quality in education performance data. The report stresses the impact that low quality data can have on measuring the success of education programs. It discusses for example the numerous data quality problems identified in the “No child left behind” program established in 2001. The problems are typical – non-standardized data definitions, inconsistent data from different sources, data entry errors, lack of timeliness.

The briefing document outlines a broad set of data quality guidelines to be implemented right across the education system in the US – at State level, in Local Education Agencies (LEAs) and in schools themselves. The three foundation stones of the data quality framework outlined are:

• suitable technical infrastructure,
• a comprehensive dictionary of data definitions
• staff ownership, organization and training
[Read more]

Next,