Data integrity is closely linked to the concept of trust which, in the world of human interactions, is based on a tight coupling between words and actions (do what you say and say what you do). In the IT world, this translates into first having a clear definition of data as well as how it is treated in the context of various business processes. If we have a clear definition of data, including policies such as access, privacy, change controls, etc. (the words), and if we have systems that consistently enforce the definition (the actions) then we have high trust and high data integrity. We know exactly what to expect, and the data always exactly matches our expectations.
Data itself has a life-cycle that involves being created (or manufactured or captured), processed, distributed, consumed, stored, archived, and eventually, destroyed. Accountability for data throughout the life-cycle is an important factor in addressing data integrity issues. Two key concepts in this regard are System of Source and System of Record (SOR). Normally these two are the same; when we talk about the SOR, we usually are also referring to the system that first creates or captures the data. It is possible however that the System of Source may be different than the SOR. For example, a customer master data repository could be declared the official SOR for selected customer data even though the data was originally sourced through other systems. In other words, the SOR is simply the name we give to the place where everyone agrees is the best version of the truth.
Integration Systems are Business Systems. The Enterprise Data Warehouse is a system and so is the Master Customer Repository or the Data Services Bus. Integration systems serve a business purpose which is to present a consolidated or aggregated view of data from multiple systems, or to distribute data internally or externally to other consumers of the data. Consolidating, reporting and distributing data is a business function; if we didn’t have systems to do it, then people would have to do it manually. Collectively, these shared integration components are arguably the biggest application system at most organizations, yet often they have not had clear ownership and accountability.
Each Enterprise Business System, including Integration Systems, must have three owners identified (it may be possible for one individual to serve more than one ownership role):
- Business Owner: Responsible for the system life-cycle (including when and how to retire the system) and leading the investment and prioritization decision making for changes to the system. The purpose of some business systems is to serve the IT function (such as the IT project management system or the IT configuration management system), in which case the Business Owner may in fact be an IT person.
- IT Owner: Responsible for change management processes including problem management (identifying patterns in production incidents and development permanent solutions), capacity planning, performance management, and coordinating upgrades or enhancements.
- Operations Owner: Responsible for day-to-day operations of the system, maintaining service levels, and responding to and resolving production incidents.
To resolve data integrity (trust) issues, we need to have people responsible and accountable for the words and the actions. There are two key roles that organizations generally employ to accomplish this as part of a data governance program; Data Owners (actions) and Data Stewards (words).
- Data Owner: Responsible for the System of Source or the System of Record. Since each data element in an enterprise is first created or captured in one of the Enterprise Business Systems, and since each application has a business owner, by association we therefore have the data owner identified. The data owner is responsible for resolving “garbage in” problems. If the data capture process allows data into the system that is inconsistent with the defined rules/standards, it is the responsibility of the data (system) owner to fix it.
- Data Steward: Responsible for capturing an accurate and complete description of each data element of interest to the enterprise (not all data is of interest to the enterprise), and gaining agreement across the enterprise. If agreement cannot be reached, the Data Steward may use the Data Governance function to mediate a resolution. The definition, once agreed, is not a static thing; it evolves as new understandings of nuances of business rules are uncovered, or as business processes, system operations, or external factors, require changes to the definitions. The Data Steward therefore is responsible for maintaining the definitions over time, communicating them to down-stream and up-stream processes that either consume or use the data and monitor the performance and conformance to the rules.
Another way to think of the difference between the two roles is that the Data Owner is responsible for creating and using data in the context of the system of source, while the Data Steward is responsible for the effective distribution and use of the data by everyone else in the enterprise.
The process of improving data integrity is just that – a process. It requires clearly defined roles and responsibility, and procedures for defining data, communicating it, resolving differences, monitoring compliance, among others. First we need stable and standard processes, then we need to automate routine repetitive steps in the process such as data scorecards for monitoring data quality, business glossary for capturing and communicating definitions, metadata repositories for documenting data lineage and enabling impact analysis, etc. In short, if data is an asset, we should invest the people, process and technology to effectively control data integrity and maximize the value of the asset.