Data Integration - Informatica

Informatica Data Quality

National Security vs. Privacy Rights - the Role for Technology

Ivan Chong

I ran across an interesting article concerning the US initiative to broker data exchange with various EU nations. The intent is to gain greater access to information that would help in the global war on terror.

European governments are entering into these agreements much more readily than they were four, five years ago, because concerns about terrorism are no longer confined to one side of the Atlantic.

The article then highlights the concerns over violation of personal privacy rights and the potential for abuse.

The agreement, which was described by two European officials, also allows for the transmission of "personal data revealing racial or ethnic origin, political opinion or religious or other beliefs, trade union membership or information concerning health and sexual life" in cases where they are "particularly relevant to the purposes of this agreement." It defines personal data as "any information relating to an identified or identifiable natural person."

The technology challenge can often be so consuming that we devote scarce attention to the ethical issues involved. Data integration and identity resolution technology are continually advancing. By factoring in ethical and moral considerations into the development of the technology, we should be able to support both objectives. Privacy and security do not necessarily need to be requirements that trade off against each other. In terms of identity resolution, the technology easily supports masking of personal attributes. Match results can be delivered independent of the conditions which trigger the match. Personal data used for matching can be stored in a transient manner and safeguarded against open access. etc. etc. I'm sure we can debate the efficacy of the technology towards these objectives. But at least, we should include technology in the debate.

Information Content Quality

Larry English

Information Content Quality Characteristics Larry English

One of the root causes of poor quality information is defects in the data definition, specifically the "information product specifications." Because information is a product of our business, manufacturing and service processes, the analogy of an "information product" is real, and the requirement for quality in "information product specifications" is a critical requirement for Information Quality.

This blog is the second of a series of three blogs on the critical quality characteristics (or measures) of information quality required on the TIQM Quality System.

  1. Information Product Specification Data Quality
  2. Information Content Quality
  3. Information Presentation Quality

Information Content Quality Characteristics

  • Information standards
  • Data names
  • Data definitions
  • Attribute valid value set or range of values
  • Value format for structured attributes (VIN, SSN, Product Codes)
  • Business rule specifications of constraints on data
  • Information Steward accountable for data definition quality

Information Content Quality Characteristics: The major information
content (data values) quality characteristics
include:

  • Definition conformance. Data values are consistent with
    the attribute (fact) definition
  • Completeness. Each process or decision has all the information
    it requires

    • Record completeness. A record exists for every real world object or event the enterprise needs to know about
    • Value completeness. A given data element (fact) has a value stored for all records that should have a value
  • Validity. Data values conform to the information product specifications
    • Value validity. A data value is a valid value or within a specified range of valid values for this data element
    • Business rule validity. Data values conform to the specified business rules
    • Derivation validity. A derived or calculated data value is produced correctly according to a specified calculation formula or set of derivation rules. If the base values are accurate, and the calculation is correctly performed, then result will be Accurate
  • Accuracy. Data values are correct.
    • Accuracy to surrogate source. The data agrees with an original, corroborative source record of data, such as a notarized birth certificate, document, or unaltered electronic data received from a party outside the control of the organization that is demonstrated to be a reliable source
    • Accuracy to reality. The data correctly reflects the characteristics of a real-world object or event being described. Accuracy and precision represent the highest degree of inherent information quality possible
  • Precision. Data values are correct to the right level of detail, such as price to the penny or weight to the nearest tenth of a gram
  • Non-duplication. There is only one record in a database representing a given real-world object or event
  • Source quality warranties/certifications. The source of information: (1) guarantees the quality of information it provides with remedies for non-compliance; (2) documents its certification in its information quality management capabilities to capture, maintain, and deliver quality information; or (3) provides objective and verifiable measures of the quality of information it provides in agreed-upon quality characteristics
  • Equivalence of redundant or distributed data. Data in one database is semantically equivalent to data about the same objects or events in another database
  • Concurrency of redundant or distributed data. The information float or lag time is minimal between (a) when data is knowable created or changed) in one database to (b) when it is also knowable in a redundant or distributed database, and concurrent queries to each database produce the same result

For more about Information Content Quality, see Chapter 6, "Assessing
Information Quality," in Improving Data Warehouse and Information Quality.
This contains a more comprehensive list of quality characteristics with examples.
It also describes how to measure these quality characteristics. The next blog
will discuss information presentation quality characteristics required for the
finished Information Product presented to the knowledge workers.

What do you think? Share your experiences in measuring information content
quality, especially accuracy.

Identity Systems Acquisition - the Next Evolution of Data Quality

Chris Cingrani

On May 15th, Informatica completed the acquisition of Identity Systems, a pioneer in identity resolution technology allowing customers to search and match identification data across multiple systems and more than 60 languages. As someone who has been in the data quality space since 2001, I remember encountering Identity Systems in various sales cycles when they were known as Search Software America (SSA). In each instance, I recall that the level of sophistication around matching they offered was something that was difficult to compete against. If we could expand the data quality discussion to include other aspects, such as cleansing and validation, we had a much better chance in the proof of concept or sales cycle. [Read more]

You can't have CDI without Data Quality

Tom Golden

Looking in Webopedia.com recently I came across a definition for CDI. Yes webopedia.com - it bills itself as the #1 online encyclopedia dedicated to computer technology. You might wonder what I was doing surfing this font of knowledge - well I had time on my hands between delayed flights coming back to Europe from the US. You know what they say "time to spare, travel by air."

The Webopedia.com CDI definition went: "Short for Customer Data Integration, it is the combination of the technology, processes, and services needed to create and maintain an accurate, timely and complete view of the customer across multiple channels, business lines, and, potentially, enterprises, where there are multiple sources of customer data in multiple application systems and databases."

A bit long winded perhaps, but the three words that shone out at me through the glare of the florescent lights in San Francisco airport were "accurate, timely and complete"; all data quality issues. Despite this, few if any of the Customer Data Integration (CDI) vendors in the market today have truly addressed the data quality issues in their CDI solutions. And anyone who has gone down the route of developing their own custom-built CDI application will be all too familiar with the data quality demands involved.
[Read more]

Information Quality & Management Transformation

Larry English

I recently received an email from one of my early clients. After having worked in four different companies in four different industries, she came to a sad conclusion, writing:

“The thing that they all have in common is a desire to cut corners and deal with quality later. It takes a lot of energy to be the information quality cheerleader, and I find it discouraging and overwhelming at times. Keep writing your articles and books to encourage all the people like me who are dealing with these issues every day.” P. G.

The discovery that P. G. has experienced is, unfortunately, the norm—not the exception. There are two critical elements in this experience.

[Read more]

Valuing Data Quality

Garry Moroney

Determining the aggregated return on investment for a data quality management initiative is notoriously difficult. Typically a minimum or partial ROI can be estimated by reference to the impact of low quality data on one or two key projects or processes. For example in a CRM project data quality ROI can be tied to reductions in customer contact failures and increased sales due to high quality segmentation. But given that the same set of master data will be used more than once in most organizations (i.e. customer master data will also be used in the billing system, the supply chain system and so on) and will add value (or destroy value!) in all of these processes, basing your ROI calculations on a single system or process will always underestimate the true returns.

For an organization trying to estimate the total returns across the enterprise from a data quality initiative, there are two difficult questions that must be addressed:

• How valuable is this dataset to the enterprise - assuming 100% data quality?
• How does its value decrease as quality erodes?

While these questions might at first seem unanswerable, it is worth noting that these are not unusual questions for a business to ask. In fact businesses need to be able to answer questions of worth and depreciation for all their tangible assets - property, stock etc.

Unfortunately data is one of those intangible assets where normal valuation approaches like recorded cost or replacement value are ineffective. But there are other intangible assets such as IPR, work-in-progress, customer and partner relationships (good will) where significant research has been done to develop effective valuation methodologies. It just might be possible to leverage these methodologies to value your data. For example, the value of customer data is directly related to the value of the customers themselves and so "customer lifetime value" methodologies should be applicable in estimating the value of customer data and the extent to which this value varies with data quality.

Have any of you out there attempted to put a real value on your company data in this way? Perhaps you'd be willing to share your experiences with us.

For more information on building a business case for data quality and calculating potential return on investment see the Informatica white papers: Data Quality Profiling Calculating ROI for Data Migration and Data Integration Projects and The Data Quality Business Case—Projecting Return on Investment.

The Gift of High Quality Information

Larry English

What would happen if your knowledge workers returned from the holidays and when they “opened” their data marts, they found nothing but high quality information? No missing information to have to hunt for. No wrong information to have to correct. No misleading information to cause them to make the wrong decision.

Imagine what it would be like if people could do their value work without hunting for, correcting, or recovering from failure caused by poor quality

[Read more]

,