Data Integration - Informatica

Informatica Data Quality

Information Presentation Quality

Larry English

"Information Presentation Quality Characteristics"

This blog is the third and last of a series of blogs on the critical-to-quality characteristics of information quality required to achieve Total Information Quality Management. For information to have quality to knowledge workers:

  • It must be clearly defined so knowledge workers understand its meaning
  • It must be complete, accurate, and consistent across all data stores
  • It must be accessed and presented in a timely basis, and in an unbiased way that reveals the truth, so that the knowledge workers can take the right action or make the right decision

The last set of quality characteristics that knowledge workers require is presentation quality characteristics, which we discuss here.

  1. Information Product Specification Data Quality
  2. Information Content Quality
  3. Information Presentation Quality

It is a fatal mistake to measure only the quality of the data content to determine Information Quality. Many process and decision failures result from poor quality presentation of the information.

Presentation quality is part of the human-machine interface. Presentation quality characteristics represent the "look and feel" of the finished information product. These characteristics are not just the prettiness or flashiness of information presented, but represents the degree to which the information communicates the message in the data accurately and clearly to the information consumer so they can perform their work effectively.

Information Presentation Quality Characteristics:

The major information presentation (delivery or communication to information consumers) quality characteristics include:

A.1.1 Quality Characteristics of Information Presentation

Knowledge workers require different content quality characteristics based on their need for that information. Based on my work with dozens of clients, the major information presentation quality characteristics include:

  • Availability. Information is accessible when it is needed
  • Accessibility. Being able to get the information when needed
  • Presentation Media Appropriateness. Being presented in the right technology medium, such as online, hardcopy report, audio, or video
  • Relevancy. Information is appropriate for the task at hand, i.e., information required to perform a process or make a decision
  • Presentation Standardization. Formatted data is presented consistently in a standardized way across different media, such as in computer screens, generated reports, or manually prepared reports
  • Structured Values. Structured attributes like dates, time, telephone numbers, tax id numbers, product codes, and currency amounts should be presented in a consistent, standard way in any presentation. When numbers and identifiers are chunked, such as standard phone number formats (e.g., [1] (615) 837-1211) they are easier to remember and use
  • Structured Documents. Repeating reports should have a standard format with a style sheet that presents the information in a format that is consistent, easy to read, and easy to understand

Documents should use readability-enhancing techniques such as:

  • Information chunking
  • Use of simple words
  • Short sentences with active verbs
  • Bulleted items for lists
  • A readability index of three grade levels below the reading audience

Methods such as "Information Mapping" help improve readability of documents.

Presentation Clarity. Information is presented in a way that communicates the truth of the information. Clear labels, footnotes, other explanatory notes, references, or links to definitions and/or documentation that clearly communicate the meaning and any anomalies in the information enhance presentation clarity

Changes in data definition or in business rule specification can cause comparing information across time boundaries to be not accurate

Signage Clarity. Signs and other information-bearing mechanisms like traffic signals should be standardized and made universal across the broadest audience possible

Traffic signal lights are now standardized globally with red (stop), yellow (caution), and green (go) meanings. Furthermore, traffic signal lights have standard placements with red on top and green at the bottom for people with color-blindness, so that meaning is consistently associated with the position. The "redundancy" in this message system reduces error in those affected by color-blindness

Presentation Objectivity. Information is presented without bias, enabling the knowledge worker to understand the meaning and significance without misinterpretation

Numeric or quantitative data often requires graphical presentation. Objectivity means that the graphical or visual presentation of the information does NOT distort the truth as evidenced in the data

Presentation Utility. Information is presented in a way that is intuitive and appropriate for the task at hand. The presentation of information will vary by the individual uses for which it is required. Some uses require concise presentation, while others require a complete, detailed presentation, and yet others require graphics, color-coding, or other highlighting techniques

For more about Information Presentation Quality, see Chapter 6, "Assessing Information Quality," in Improving Data Warehouse and Information Quality. This contains a more comprehensive list of quality characteristics with examples. It also describes how to measure these quality characteristics.

What do you think? Share your experiences in measuring or improving information presentation quality.

Information Content Quality

Larry English

Information Content Quality Characteristics Larry English

One of the root causes of poor quality information is defects in the data definition, specifically the "information product specifications." Because information is a product of our business, manufacturing and service processes, the analogy of an "information product" is real, and the requirement for quality in "information product specifications" is a critical requirement for Information Quality.

This blog is the second of a series of three blogs on the critical quality characteristics (or measures) of information quality required on the TIQM Quality System.

  1. Information Product Specification Data Quality
  2. Information Content Quality
  3. Information Presentation Quality

Information Content Quality Characteristics

  • Information standards
  • Data names
  • Data definitions
  • Attribute valid value set or range of values
  • Value format for structured attributes (VIN, SSN, Product Codes)
  • Business rule specifications of constraints on data
  • Information Steward accountable for data definition quality

Information Content Quality Characteristics: The major information
content (data values) quality characteristics
include:

  • Definition conformance. Data values are consistent with
    the attribute (fact) definition
  • Completeness. Each process or decision has all the information
    it requires

    • Record completeness. A record exists for every real world object or event the enterprise needs to know about
    • Value completeness. A given data element (fact) has a value stored for all records that should have a value
  • Validity. Data values conform to the information product specifications
    • Value validity. A data value is a valid value or within a specified range of valid values for this data element
    • Business rule validity. Data values conform to the specified business rules
    • Derivation validity. A derived or calculated data value is produced correctly according to a specified calculation formula or set of derivation rules. If the base values are accurate, and the calculation is correctly performed, then result will be Accurate
  • Accuracy. Data values are correct.
    • Accuracy to surrogate source. The data agrees with an original, corroborative source record of data, such as a notarized birth certificate, document, or unaltered electronic data received from a party outside the control of the organization that is demonstrated to be a reliable source
    • Accuracy to reality. The data correctly reflects the characteristics of a real-world object or event being described. Accuracy and precision represent the highest degree of inherent information quality possible
  • Precision. Data values are correct to the right level of detail, such as price to the penny or weight to the nearest tenth of a gram
  • Non-duplication. There is only one record in a database representing a given real-world object or event
  • Source quality warranties/certifications. The source of information: (1) guarantees the quality of information it provides with remedies for non-compliance; (2) documents its certification in its information quality management capabilities to capture, maintain, and deliver quality information; or (3) provides objective and verifiable measures of the quality of information it provides in agreed-upon quality characteristics
  • Equivalence of redundant or distributed data. Data in one database is semantically equivalent to data about the same objects or events in another database
  • Concurrency of redundant or distributed data. The information float or lag time is minimal between (a) when data is knowable created or changed) in one database to (b) when it is also knowable in a redundant or distributed database, and concurrent queries to each database produce the same result

For more about Information Content Quality, see Chapter 6, "Assessing
Information Quality," in Improving Data Warehouse and Information Quality.
This contains a more comprehensive list of quality characteristics with examples.
It also describes how to measure these quality characteristics. The next blog
will discuss information presentation quality characteristics required for the
finished Information Product presented to the knowledge workers.

What do you think? Share your experiences in measuring information content
quality, especially accuracy.

Start small with monitoring, but always think big to achieve data quality goals

Tom Golden

I attended my first parent-teacher meeting the other day for my five-year old daughter. Another one of those “life stage” events done and dusted – I remember dreading the annual meeting when I was a kid. The notion of my parents and my teacher comparing notes on my behaviour was too much to bear – somebody was eventually going to put two and two together and find out I was up to no good.

It all got me thinking about a recent blog post by my esteemed colleague Garry Moroney. His post Mobilizing the Data Quality Army outlined the level of effort, thought and planning that the US Department of Education is putting into data quality.

As Garry points out dealing with data quality in a large, disconnected organization such as the US schools system is not a trivial exercise. But if you were to only read that one post you might be overwhelmed by the potential size of the data quality task in front of you.
[Read more]

Business and IT Collaboration is Essential for Data Quality

Ivan Chong

A recent InformationWeek article* described the growth in IT employment across the US as a result of a shift in skills. Rather than focusing on pure IT proficiency, organizations are looking for talent with “a more hybrid mix of technology skills, along with an understanding of the business and its customers.”

IT departments are highly motivated to increase the level of collaboration with their counterparts in the business. Nowhere is this more critical than in the area of data quality and the trend is causing a shift in the way companies are looking to solve their data quality issues. First generation data quality tools had a natural focus on technology, instead of business. Here are some of the differences between technology focused data quality solutions and business-focused data quality solutions.

Tools vs. Process
Technology focused data quality solutions provide tools that automate data processing. Evidence of this type of focus can be seen in the way that vendors will tout the sophistication and type of their algorithms over and above their ability to support ongoing data quality management processes. While technology is extremely important, its relevance cannot eclipse the overall data quality management process. Even if your data quality tool can automate the correction of 95 percent of the data, if the remaining five percent cannot be managed properly, you will continue to suffer from poor data quality.
[Read more]

Information Quality & Management Transformation

Larry English

I recently received an email from one of my early clients. After having worked in four different companies in four different industries, she came to a sad conclusion, writing:

“The thing that they all have in common is a desire to cut corners and deal with quality later. It takes a lot of energy to be the information quality cheerleader, and I find it discouraging and overwhelming at times. Keep writing your articles and books to encourage all the people like me who are dealing with these issues every day.” P. G.

The discovery that P. G. has experienced is, unfortunately, the norm—not the exception. There are two critical elements in this experience.

[Read more]

Alice in "Qualityland"

Alice: Would you tell me, please, which way I ought to go from here?
The Cheshire Cat: That depends a good deal on where you want to get to
Alice: I don't much care where.
The Cheshire Cat: Then it doesn't much matter which way you go
– Lewis Carroll, Alice's Adventures in Wonderland

Chris McCauley

When confronted with the problem of how to address their data quality issues many organisations are faced with a similar dilemma to that which confronted Alice during her travels in Wonderland; “I know that I need to do something, but I don’t know where to start”. Knowing where to start and, equally importantly, the size of the problem as well as where an organisation needs to go are critical factors in ensuring that their data quality journey takes them where they need to be at the price they are prepared to pay.

When planning their “journey” organisations need to address the issue of data quality holistically by considering each of the three DQ pillars in turn; firstly “People”, then “Ideas” and finally “Technology”. Many DQ initiatives have failed as the primary focus has been on delivering a technical solution. However without the right framework in place and operated by the right people this approach will never deliver the results that organisations need. Time and time again within the IT industry it has been proved that the pure application of technology will never solve business issues, as technology in itself will never win the “war”, it is always the right people with the right ideas who use the technology in the right way.
[Read more]

Mobilizing the Data Quality Army

I’ve just been reading a US Department of Education briefing document on improving data quality in education performance data. The report stresses the impact that low quality data can have on measuring the success of education programs. It discusses for example the numerous data quality problems identified in the “No child left behind” program established in 2001. The problems are typical – non-standardized data definitions, inconsistent data from different sources, data entry errors, lack of timeliness.

The briefing document outlines a broad set of data quality guidelines to be implemented right across the education system in the US – at State level, in Local Education Agencies (LEAs) and in schools themselves. The three foundation stones of the data quality framework outlined are:

• suitable technical infrastructure,
• a comprehensive dictionary of data definitions
• staff ownership, organization and training
[Read more]

If all master data was like customer data . . .

Garry Moroney

Managing the quality of customer data has its challenges: It is typically collected from a wide range of sources and channels and very often those responsible for entering or capturing data have no incentive to do so accurately. Even if they do much of it, including contact data can go out-of-date rapidly. Despite this most customer data has one major advantage over many other types of data which is agreed and accepted standards and reference data. While these standards do vary from country to country, they are at least universally understood and have an enormous impact on the approach and the effort required for managing data quality.

Because of these global standards and references, there is general agreement on what a complete, valid and correctly formatted address should look like - likewise person or business name, telephone number, date of birth, email address etc. So this means that if I am sharing my customer data with my business partners at least we have a common view of what high quality data should look like and the checks we need to make to assess the quality levels.

Another huge benefit is that third party service providers and technology vendors also understand the requirements and standards by which to measure and improve customer data and they know that these requirements are largely the same for all vendors. As a result large numbers of service bureaux and technology vendors are able to offer well developed, generic, out-of-the-box products and services to tackle customer data quality issues. These can deliver a lot of value with minimal or no customization and the effort to acquire and implement these solutions is small.
[Read more]

Valuing Data Quality

Garry Moroney

Determining the aggregated return on investment for a data quality management initiative is notoriously difficult. Typically a minimum or partial ROI can be estimated by reference to the impact of low quality data on one or two key projects or processes. For example in a CRM project data quality ROI can be tied to reductions in customer contact failures and increased sales due to high quality segmentation. But given that the same set of master data will be used more than once in most organizations (i.e. customer master data will also be used in the billing system, the supply chain system and so on) and will add value (or destroy value!) in all of these processes, basing your ROI calculations on a single system or process will always underestimate the true returns.

For an organization trying to estimate the total returns across the enterprise from a data quality initiative, there are two difficult questions that must be addressed:

• How valuable is this dataset to the enterprise - assuming 100% data quality?
• How does its value decrease as quality erodes?

While these questions might at first seem unanswerable, it is worth noting that these are not unusual questions for a business to ask. In fact businesses need to be able to answer questions of worth and depreciation for all their tangible assets - property, stock etc.

Unfortunately data is one of those intangible assets where normal valuation approaches like recorded cost or replacement value are ineffective. But there are other intangible assets such as IPR, work-in-progress, customer and partner relationships (good will) where significant research has been done to develop effective valuation methodologies. It just might be possible to leverage these methodologies to value your data. For example, the value of customer data is directly related to the value of the customers themselves and so "customer lifetime value" methodologies should be applicable in estimating the value of customer data and the extent to which this value varies with data quality.

Have any of you out there attempted to put a real value on your company data in this way? Perhaps you'd be willing to share your experiences with us.

For more information on building a business case for data quality and calculating potential return on investment see the Informatica white papers: Data Quality Profiling Calculating ROI for Data Migration and Data Integration Projects and The Data Quality Business Case—Projecting Return on Investment.

Business-Focused Data Quality

Garry Moroney

It’s coming up to the end of my first year as Head of Informatica’s Data Quality Division and what a year it’s been. Data Quality has long been an obsession of mine – long before Informatica acquired the data quality software company I headed up, Similarity Systems, and even before my colleagues and I founded Similarity six years earlier.

We set up Similarity Systems in 2000 because of our absolute belief that data quality was on the cusp of making a breakthrough as one of the critical performance drivers for large businesses and organizations everywhere.

Over the years since then we have seen data quality move rapidly up the agenda. Data was once the sole preserve of IT – but today boardroom executives already have found reason to talk about and care about data and data quality – Good data quality can be the foundation for success, while poor data quality is a root cause of failure in many of the key initiatives for today’s businesses and government organizations. These executives know customer service is a data quality issue, compliance is a data quality issue, supply chain automation is a data quality issue … I could go on, but there will lots of time for that later.

My goal in writing this blog is to share views and experience with others who are passionate about data quality. I see it as a forum for widening understanding of the enormous business value that can be generated through active, effective data quality management.

My days revolve around meeting with data quality customers, meeting with partners and working with our own product development and implementation specialists. Through this blog I hope to share some of the insights and experience garnered from this day-to-day interaction with these groups. And hopefully my conversations with these groups will be influenced by the feedback I receive from you through this blog.

I have set myself only two guidelines. I’ll be aiming to stick to unerringly to them:
• Keep it short (because I’m busy and you probably are too)
• Keep it business focused (because data quality is a business problem opportunity)

Until next time…

,