The importance of high-quality data requires constant vigilance

First in a two-part blog series about monitoring data quality performance using data quality metrics.

Organizational data quality management is often introduced as a reaction to an acute issue or ongoing problems—issues or problems where data failure adversely affected the business. This reactive approach may be typified by a rush to identify, evaluate, and purchase technical solutions that may address the manifestation of problems, instead of isolating the root causes and eliminating the source of the introduction of flawed data.

In leading organizations, the business case for data quality improvement may have been developed as a result of assessing how poor data quality impacted the achievement of business objectives, and reviewing how holistic, enterprise-wide approaches to data quality management can benefit the organization as a whole. By following a process to justify the costs of investing in data quality improvement, you can identify key business areas that are impacted by poor data quality and whose performance improvements are tied to high-quality information.

Clearly, data quality is not a one-time effort. The events and changes that allow flawed data to be introduced into an environment are not unique; rather, there are always new and insidious ways that can negatively impact the quality of data. It is necessary for the data governance teams to not just address acute data failures, but also establish a baseline for the current state of data quality so that one can identify the critical failure points and determine improvement targets. This involves a few critical ideas:

  • Organizations need a way to formalize data quality expectations as a means for measuring conformance of data to those expectations
  • Organizations must be able to establish a baseline for the levels of data quality and provide a mechanism to identify leakages as well as analyze root causes of data failures
  • Organizations must be able to effectively establish and communicate to the business community the level of confidence they should have in their data, which necessitates a means for measuring, monitoring, and tracking data quality

The ability to motivate data quality improvement as a driver of increasing business productivity demonstrates a level of organizational maturity that views information as an asset. When you have captured how information gaps correspond to lowered business performance, the next logical step is to convey the productivity benefits that result from general data governance, stewardship, and management.

Most interestingly, these two activities are different sides of the same coin—both basically depend on a process of determining the value added by improved data quality as a function of conformance to business expectations, and how those expectations are measured in relation to component data quality rules. If business success is quantifiable, and the dependence of the business on high-quality data is measurable, then any improvements to the information asset should reflect measurable business performance improvements as well.

This suggests that by metrics used for monitoring the quality of data can actually roll up into higher level performance indicators for the business as a whole. Once you understand how poor data quality impacts both operational activities and strategic initiatives, and that the process used to assess business impact and justify data quality improvement can be used in turn to monitor ongoing data quality management. And, by relating those business impacts to data quality rules, an organization can employ those rules to establish a baseline measurement along with ongoing monitoring of data quality performance.

Data Quality Governance and Data Quality Metrics

Establishing a business case for data quality improvement hinges upon the ability to document the pains   that the business incurs because of data defects. The tasks of segmenting them across impact dimensions and categorizing each impact within lower levels of a hierarchical taxonomy facilitates researching negative financial impacts specifically attributable to bad data. Reviewing the scale of the data failures based on their corresponding negative financial impacts will suggest ways to prioritize the remediation of data defects, which in turn relies on data quality solutions.

A fundamental challenge to employing the concept of return on investment for justifying the funding of an improvement project lies in the ability to monitor, over time, whether the improvements implemented through the project are facilitating the promised positive impacts. So—in contrast to the approach used to establish the business case—we can see that if business performance, customer satisfaction, compliance, and automated logistics are all directly tied to ensuring high-quality data, then we should be able to use similar metrics to evaluate the ongoing effectiveness of the data quality program. Documenting this approach, standardizing its roles and responsibilities, and integrating the right tools and methods are the first key tasks in developing a data governance framework.

Positive Impacts of Improved Data Quality

We develop a business case by assessing the negative impacts of poor data quality across a number of high-level categories: decreased revenues, increased costs, increased risk, and decreased confidence. Since a proactive approach to data governance and data quality enables the identification of the introduction of flawed data within the application framework, the flawed processes that are responsible for injecting unexpected data can be corrected, eliminating the source of the data problem. As we eliminate the sources of poor data quality, instead of looking at the negative impact of poor data quality, we should consider the positive impacts of improved data quality, namely: increased revenues, decreased costs, decreased risks, and increased confidence.

Business Policy, Data Governance, and Rules

Not only did the impact analysis phase of the business case process identify impact areas, but it also provided some level of measurement and corresponding metrics. For example, Figure 1 shows an example of how data errors introduced at an early stage of processing contribute to downstream business impacts. The missing product identifiers, inaccurate product descriptions, and inconsistency across different systems contributed to the list of business impacts shown at the right.

The determination of an impact area relates to missed expectations associated with a business policy, as can be seen in Table 1. The cost of each impact is assessed as part of the business case development, and that assessment also provides a baseline measurement as well as a target for improvement.

Consider the business policy associated with impact #4: ‘All orders must be deliverable’. The impact is incurred because of missing product identifiers, inaccurate product descriptions, and inconsistency across different subsystems, each of which contributes to reducing the deliverability of an order. In turn, assuring that product identifiers are present, product descriptions are accurate, and maintaining data consistency across applications will improve the deliverability of orders.

This assurance is brought about as a result of instituting data governance principles across the organization that provide the organization with the ability to implement, audit, and monitor data quality at multiple points across the enterprise and measure consistency and conformity against associated business expectations and key performance indicators. These tasks integrate with the management structure, processes, policies, standards, and technologies the organization requires to manage and ensure the quality of data that conforms to business policy requirements. This framework then supports ownership, responsibility, and accountability for the institution of capable data processes for measurable data quality performance improvement.

Metrics for Quantifying Data Quality Performance

The way that governance can be manifested is a challenge. As you see in the example business policies above, the statement of these policies is typically done using a natural language format like: “Maintain cost ratio for promotions to sales.” This format impedes the ability to measure conformance. Your objective is to apply a process of semantic refinement that quantifies data quality performance and allows you to develop meaningful metrics associated with well-defined data quality dimensions. The refinement steps include:

  1. Identifying the key data assertions associated with business policies
  2. Determining how those data assertions relate to quantifiable business impact
  3. Evaluating how the identified data flaws are categorized within a set of data quality dimensions and specifying the data rules that measure their occurrence
  4. Quantifying the contribution of each flaw to conformance with each business policy and
  5. Articulating and implementing the data rules

The result of this process is the extraction of the information-based assertions embedded within business policies, how those assertions are categorized within a measurement framework, and how those assertions contribute to measuring the overall conformance to the business policies. One policy can embed a number of data quality rules, and each can be categorized within one of the defined dimensions of data quality.

Breaking down data issues into these key attributes highlights where best to focus your data quality improvement efforts by identifying the most important data quality issues and attributes based on the lifecycle stage of your different projects. For example, early in a data migration, the focus may be on completeness of key master data fields, whereas the implementation of an e-commerce system may require greater concern with accuracy during individual authentication.

As an example, our hands-on workbook, “How to Improve Data Quality for Your Cloud Data Warehouse” guides you through some of the steps outline above to help ensure data is high quality and fit for purpose.

That’s all for this blog—in my next post, I will look at how organizing data quality rules within defined data quality dimensions not only can simplify how you specify and measure the levels of data quality, but also provide the underlying structure to support and show how data quality expectations can be transformed into a set of measurable, reportable, actionable assertions. In the meantime, why not sign up for a Free 30-day Trial of Cloud Data Quality?