Data Integration - Informatica

Informatica Data Quality

Information Content Quality

Larry English

Information Content Quality Characteristics Larry English

One of the root causes of poor quality information is defects in the data definition, specifically the "information product specifications." Because information is a product of our business, manufacturing and service processes, the analogy of an "information product" is real, and the requirement for quality in "information product specifications" is a critical requirement for Information Quality.

This blog is the second of a series of three blogs on the critical quality characteristics (or measures) of information quality required on the TIQM Quality System.

  1. Information Product Specification Data Quality
  2. Information Content Quality
  3. Information Presentation Quality

Information Content Quality Characteristics

  • Information standards
  • Data names
  • Data definitions
  • Attribute valid value set or range of values
  • Value format for structured attributes (VIN, SSN, Product Codes)
  • Business rule specifications of constraints on data
  • Information Steward accountable for data definition quality

Information Content Quality Characteristics: The major information
content (data values) quality characteristics
include:

  • Definition conformance. Data values are consistent with
    the attribute (fact) definition
  • Completeness. Each process or decision has all the information
    it requires

    • Record completeness. A record exists for every real world object or event the enterprise needs to know about
    • Value completeness. A given data element (fact) has a value stored for all records that should have a value
  • Validity. Data values conform to the information product specifications
    • Value validity. A data value is a valid value or within a specified range of valid values for this data element
    • Business rule validity. Data values conform to the specified business rules
    • Derivation validity. A derived or calculated data value is produced correctly according to a specified calculation formula or set of derivation rules. If the base values are accurate, and the calculation is correctly performed, then result will be Accurate
  • Accuracy. Data values are correct.
    • Accuracy to surrogate source. The data agrees with an original, corroborative source record of data, such as a notarized birth certificate, document, or unaltered electronic data received from a party outside the control of the organization that is demonstrated to be a reliable source
    • Accuracy to reality. The data correctly reflects the characteristics of a real-world object or event being described. Accuracy and precision represent the highest degree of inherent information quality possible
  • Precision. Data values are correct to the right level of detail, such as price to the penny or weight to the nearest tenth of a gram
  • Non-duplication. There is only one record in a database representing a given real-world object or event
  • Source quality warranties/certifications. The source of information: (1) guarantees the quality of information it provides with remedies for non-compliance; (2) documents its certification in its information quality management capabilities to capture, maintain, and deliver quality information; or (3) provides objective and verifiable measures of the quality of information it provides in agreed-upon quality characteristics
  • Equivalence of redundant or distributed data. Data in one database is semantically equivalent to data about the same objects or events in another database
  • Concurrency of redundant or distributed data. The information float or lag time is minimal between (a) when data is knowable created or changed) in one database to (b) when it is also knowable in a redundant or distributed database, and concurrent queries to each database produce the same result

For more about Information Content Quality, see Chapter 6, "Assessing
Information Quality," in Improving Data Warehouse and Information Quality.
This contains a more comprehensive list of quality characteristics with examples.
It also describes how to measure these quality characteristics. The next blog
will discuss information presentation quality characteristics required for the
finished Information Product presented to the knowledge workers.

What do you think? Share your experiences in measuring information content
quality, especially accuracy.

Data Quality Maturity Model - How Does Your Organization Rate?

Chris Cingrani

Recently I spoke at a User Group Meeting on the topic “Align for Success: The critical part Data Quality plays in complex Business and IT Initiatives.” I began the discussion by polling the group to find out how many of the organizations represented had a data quality solution in place. The response to the question was mixed, with approximately half the audience indicating they either had a solution or were considering one, while the other half indicated they weren’t currently considering data quality (or the person was unaware of any data quality initiatives). Although this was a very unscientific survey, it set the tone for my presentation, as I attempted to explain the concept of a data quality maturity model. By understanding where an organization is today from the standpoint of the model, management can begin to develop plans as to where they want to end up both in the short and long term. [Read more]

You can't have CDI without Data Quality

Tom Golden

Looking in Webopedia.com recently I came across a definition for CDI. Yes webopedia.com - it bills itself as the #1 online encyclopedia dedicated to computer technology. You might wonder what I was doing surfing this font of knowledge - well I had time on my hands between delayed flights coming back to Europe from the US. You know what they say "time to spare, travel by air."

The Webopedia.com CDI definition went: "Short for Customer Data Integration, it is the combination of the technology, processes, and services needed to create and maintain an accurate, timely and complete view of the customer across multiple channels, business lines, and, potentially, enterprises, where there are multiple sources of customer data in multiple application systems and databases."

A bit long winded perhaps, but the three words that shone out at me through the glare of the florescent lights in San Francisco airport were "accurate, timely and complete"; all data quality issues. Despite this, few if any of the Customer Data Integration (CDI) vendors in the market today have truly addressed the data quality issues in their CDI solutions. And anyone who has gone down the route of developing their own custom-built CDI application will be all too familiar with the data quality demands involved.
[Read more]

Better management through measuring data quality

Ivan Chong

I recently asked a customer of ours why they invested so much in monitoring and publishing key performance indicators for their data quality. “Believe it or not, the biggest reason we measure data quality is not to correct bad data” came the reply. “The reason we monitor data quality is to detect problems with our business processes.”

Indeed, as I mentioned in my last blog post, business users look to investments in people and processes in addition to technology in order to address poor data quality. For example, if a bank branch manager received a report showing that customer data originating from his branch office had much higher incidents of duplicate entries and was putting the entire bank at risk of massive regulatory fines, he is not going to throw technology at the problem. His response might be mandatory training for tellers or better hiring practices to screen for adequate computer skills.

Experts in quality control methodology refer to this as addressing “root cause.” Common starting points of measurement involve completeness, accuracy, consistency, conformity, duplication, and integrity. Eventually, as the business culture matures its data quality practices, timeliness and data lineage (origination) are used to evaluate quality of data. Of course, software technology that automates the process of parsing, standardizing, matching and consolidating data is of immense value and is an absolute requirement in any data integration project. However, the issue of data quality goes beyond these IT projects. Ongoing measurement and monitoring of data quality provides value directly to the business because it helps them to better manage their people and processes.

Start small with monitoring, but always think big to achieve data quality goals

Tom Golden

I attended my first parent-teacher meeting the other day for my five-year old daughter. Another one of those “life stage” events done and dusted – I remember dreading the annual meeting when I was a kid. The notion of my parents and my teacher comparing notes on my behaviour was too much to bear – somebody was eventually going to put two and two together and find out I was up to no good.

It all got me thinking about a recent blog post by my esteemed colleague Garry Moroney. His post Mobilizing the Data Quality Army outlined the level of effort, thought and planning that the US Department of Education is putting into data quality.

As Garry points out dealing with data quality in a large, disconnected organization such as the US schools system is not a trivial exercise. But if you were to only read that one post you might be overwhelmed by the potential size of the data quality task in front of you.
[Read more]

Business and IT Collaboration is Essential for Data Quality

Ivan Chong

A recent InformationWeek article* described the growth in IT employment across the US as a result of a shift in skills. Rather than focusing on pure IT proficiency, organizations are looking for talent with “a more hybrid mix of technology skills, along with an understanding of the business and its customers.”

IT departments are highly motivated to increase the level of collaboration with their counterparts in the business. Nowhere is this more critical than in the area of data quality and the trend is causing a shift in the way companies are looking to solve their data quality issues. First generation data quality tools had a natural focus on technology, instead of business. Here are some of the differences between technology focused data quality solutions and business-focused data quality solutions.

Tools vs. Process
Technology focused data quality solutions provide tools that automate data processing. Evidence of this type of focus can be seen in the way that vendors will tout the sophistication and type of their algorithms over and above their ability to support ongoing data quality management processes. While technology is extremely important, its relevance cannot eclipse the overall data quality management process. Even if your data quality tool can automate the correction of 95 percent of the data, if the remaining five percent cannot be managed properly, you will continue to suffer from poor data quality.
[Read more]

Building the Business Case for Data Quality

Chris Cingrani

As a new contributor to the data quality blog site, I wanted to start by introducing myself and highlighting the types of topics I plan to discuss on a semi-frequent basis. I am a Principal Consultant with Informatica Professional Services and have spent the past 6 years in the data quality space in a variety of sales and post sales roles. During this time I have seen the data quality market continue to evolve and mature. Thus, I would like to use this column to reflect on the types of use cases I have seen and continue to see when meeting with organization’s faced with data quality problems. I hope these posts can start an active dialogue, regardless if your company is trying to tackle their first data quality initiative or looking to build out a formal center of excellence around data quality.

To start, I wanted to pose a common question I am often asked by clients and prospects – how do I build a business case for data quality? Although an organization may think (or even know) there is a problem, the need to justify the cost around procuring a data quality solution often exists. This justification requirements often comes from the idea that data quality issues aren’t necessarily a core business issue (how wrong this is!) or something that can be handled through manual intervention (this is true – if you have unlimited time and money, but even then your results will be limited). Thus, the following points are meant to help start an organization down the path to building the internal business case through a Data Quality Audit. Note - if you have access to Informatica’s Velocity Methodology, I go into these steps in further detail in the best practice document, “Developing the Data Quality Business Case.”
[Read more]

Information Quality & Management Transformation

Larry English

I recently received an email from one of my early clients. After having worked in four different companies in four different industries, she came to a sad conclusion, writing:

“The thing that they all have in common is a desire to cut corners and deal with quality later. It takes a lot of energy to be the information quality cheerleader, and I find it discouraging and overwhelming at times. Keep writing your articles and books to encourage all the people like me who are dealing with these issues every day.” P. G.

The discovery that P. G. has experienced is, unfortunately, the norm—not the exception. There are two critical elements in this experience.

[Read more]

Alice in "Qualityland"

Alice: Would you tell me, please, which way I ought to go from here?
The Cheshire Cat: That depends a good deal on where you want to get to
Alice: I don't much care where.
The Cheshire Cat: Then it doesn't much matter which way you go
– Lewis Carroll, Alice's Adventures in Wonderland

Chris McCauley

When confronted with the problem of how to address their data quality issues many organisations are faced with a similar dilemma to that which confronted Alice during her travels in Wonderland; “I know that I need to do something, but I don’t know where to start”. Knowing where to start and, equally importantly, the size of the problem as well as where an organisation needs to go are critical factors in ensuring that their data quality journey takes them where they need to be at the price they are prepared to pay.

When planning their “journey” organisations need to address the issue of data quality holistically by considering each of the three DQ pillars in turn; firstly “People”, then “Ideas” and finally “Technology”. Many DQ initiatives have failed as the primary focus has been on delivering a technical solution. However without the right framework in place and operated by the right people this approach will never deliver the results that organisations need. Time and time again within the IT industry it has been proved that the pure application of technology will never solve business issues, as technology in itself will never win the “war”, it is always the right people with the right ideas who use the technology in the right way.
[Read more]

Mobilizing the Data Quality Army

I’ve just been reading a US Department of Education briefing document on improving data quality in education performance data. The report stresses the impact that low quality data can have on measuring the success of education programs. It discusses for example the numerous data quality problems identified in the “No child left behind” program established in 2001. The problems are typical – non-standardized data definitions, inconsistent data from different sources, data entry errors, lack of timeliness.

The briefing document outlines a broad set of data quality guidelines to be implemented right across the education system in the US – at State level, in Local Education Agencies (LEAs) and in schools themselves. The three foundation stones of the data quality framework outlined are:

• suitable technical infrastructure,
• a comprehensive dictionary of data definitions
• staff ownership, organization and training
[Read more]

Next,