So You Want One Version Of The Truth?

I told the head of the Enterprise Data Warehouse at a large bank, “you don’t have a data warehouse, you have 50,000 tables.” The issue is that the bank built the EDW without the necessary fundamentals in place. It wasn’t for lack of money; in fact the EDW was one of the biggest “money sinks” in the bank. The problem is that it was sitting on a sinking foundation.

One version of the truth isn’t achieved by putting all your data in one big system or one big database – that’s impossible. An enterprise data warehouse is indeed part of the solution, but it needs to be built on a solid foundation. What does a solid foundation look like? Here are five pillars for one version of the truth.

  1. Establish a Metadata Management Office (MMO) that is responsible for usability and performance of the metadata system. This involves a disciplined approach to deriving value from a federated collection of repositories about data at rest, data in motion and data changes. Check out my prior post for more on this topic. The first pillar of the foundation is to have a permanent organizational group that is the equivalent of the accounting department for financial assets. Just like the accountants don’t make the investment or financing decisions, the MMO doesn’t define business terms or establish security policies, but they can tell you where the data is and how it’s changing.
  2. Implement a Business Glossary or Data Dictionary based on logical models for business domains and data stewards to maintain the definitions. Note that I didn’t say “an enterprise logical model.” While enterprise reference models are indeed valuable for strategic alignment and business transformation planning, logical data models are most often successful at the business domain level since they generally die under the weight of complexity at the enterprise level. Furthermore, note the term “maintain” – definitions are not static and need owners to keep them relevant as the world turns and the business and IT evolve. This is the second pillar for one version of the truth. Often-times multiple versions of the truth is a definitional problem in the eyes of the beholder – it’s not the data’s fault.
  3. Measure data quality on an ongoing basis.  It goes without saying that “you can’t manage what you don’t measure” yet many organizations still try to manage data without metrics which results in a piecemeal, ad-hoc and fire-fighting approach to improving data integrity. The third pillar is to get data clean and keep it clean by establishing data quality standards, measuring it regularly and communicating the results to both management and front line staff. In short, shine a bright light on data quality.
  4. Establish clear accountability for business systems and integration systems such as the enterprise data warehouse or MDM system.  Multiple versions of the truth are often a result of the same information being captured and stored in slightly different ways by different systems.  Before you can resolve the differences and identify the system-of-record or source-of-record, you need to know who to talk to.  Each system should have a business owner who drives the investment and prioritization decisions, an IT owner who leads the planning and change management activities, and an operations owner who is responsible for maintaining service levels and resolving production incidents. The fourth pillar therefore is a definitive list of applications (yes all of them in the organization) and clear accountability for each of them.
  5. Establish a Data Governance program to resolve disputes, define policies for data access, set priorities for data improvement initiatives and maintain a data risk register. For the data governance council to be effective, it needs input from the other four pillars, and it needs the other four pillars to implement and execute its policies and decisions. Data governance without DQ metrics, MMO data lineage or change impact reports, unclear system accountability, or a business glossary to communicate the results is like trying to govern by tribal knowledge. The fifth pillar therefore cannot stand on its own, and is an essential stabilizer for the other four pillars. For more on this topic, check out Rob Karel’s recent post on Informatica’s Data Governance Framework.

Before you are scared off by thinking “Wow, this is a tall order. We could never achieve all these disciplines in our organization.” You should know that it takes time to build these capabilities and you don’t need to be “perfect” with all of them from day one. It takes most organizations from two to five years to build mature capabilities in each area. The point nonetheless remains – achieving one version of the truth is not simply a technical problem – it is an agreement problem which requires discipline in all five pillars.

This entry was posted in Business/IT Collaboration, Data Aggregation, Data Governance, Data Integration, Data Integration Platform, Data Warehousing, Enterprise Data Management, Integration Competency Centers, Master Data Management and tagged , , , , , , . Bookmark the permalink.

6 Responses to So You Want One Version Of The Truth?

  1. “get data clean and keep it clean by establishing data quality standards, measuring it regularly and communicating the results to both management and front line staff.”

    It’s so important that everyone be kept in the loop to ensure data quality. If different teams approach data management differently than there is also going to be a disconnect. Everyone needs to know their role and what’s expected.

  2. Sanjay Pande says:

    While your recommendations are all good and do stand, there is a core disconnect on the whole “single version of the truth” myth.

    Yes, I said myth and it plagues our industry.

    The focus should be an accurate reflection of the state of the data at a point in time which is a technical problem and easily solved.

    Truth is subjective and it’s acceptable (or should be acceptable) for different versions of what the data stated at points in time to be reflected. The whole pursuit of the single version of the truth is the cause of money sinks. Manipulating/transforming the data to achieve this lofty goal is also pretty much against every rule of compliance.

    What the DW should do is reflect the state of data at points in time and enable other interesting information to come out of it. In fact the word “truth” should really never be used in Business Intelligence.

    While 50,000 tables does seem rather excessive (even for a super-normalized DW), it really depends on the integration points and what they’re doing with this information and how they’re using it.

    As long as your DW is in fact your system of record, all the other goals you mention can be accomplished.

    It’s actually not all that hard to accomplish all the above goals and has been done many times with some fundamental changes to our thinking and the way we use the DW.

    The first of course is making your DW a system of record that reflects the true state of data at any point in time. This enables integration and measurements of the state of data quality and metrics on it’s improvement over time.

    The second is to use separate processes and data models for absorption and for data dissemination to business users. That way processes can be built, scaled, tuned and maintained with simple goals.

    The third and most controversial is to actually move the business rules (with a few exceptions) out of the DW and on the way to the Data Marts or other user-delivery layers which enables rapid data pulls and also audit and compliance of all data in the DW automatically.

    This separation has many benefits as long as the DW has structures that are flexible and tuned to it’s purpose which is being the integrated enterprise data repository with reflections of the state of source data at any point in time.

    You can only ever do this with a top-down architecture, but make sure the implementation is in fact bottom up so you’re doing it from user reqs and not building a 50K table monolith which nobody uses. A bottom-up architecture will only create silos and multiple versions of everything.

    Informatica is doing a great job of facilitation with it’s tools, but there are many things that can be improved in the methodology.

    • John Schmidt says:

      Sanjay, thanks for your detailed and thoughful comment. I must say however that I disagree with your definition of truth. If a tree falls in a forest and there is no-one there to hear it, does it make a sound? The answer is no. Sound is a human experience – it requires a sentient being to observe it – otherwise it is simply pressure waves moving through a medium.

      The truth about data is the same thing. There is no absolute truth. Truth requires an observer which brings in the human element. To achieve one version of the truth is an agreement problem not a technical problem. Technology clearly does play a role, but only a supporting one. The main actors are actually the business users with their perceptions of what the data means and the process context in which it is used. Therefore, Data Governance must address the definitional issues related to data and gain agreement across lines of business. While this is not an “easy technical problem” as you suggest but rather a collaborate team-building project, it can nonetheless be addressed through an effective data governance program.


  3. Terry says:

    I agree in part to all of what everyone is saying in this post, but since reality is that when we walk into a company, they are already well on there way to established, older, and less reliable data systems. Take for instance my last company, they used a product that had not controls built into the data entry fields, they were all free form and you could put just about anything you wanted into them. Now, you also have some newer systems and the controls have been built in to these newer applications. When you begin to look at MDM, EDW, and data integrity, it becomes a tall, very tall order for most companies. Now if you don’t have buy in from the very top of your organization that you need to cleanse this data and get it organized, this will fail every time. There has to be a way to be able to do this, smaller chunks at a time possibly, so that over time you create the mechanisms and learning, and basically it then becomes second nature to operate this way. But it could take years, and again that depends on how much you want to throw into this. Most places, and it is slowly changing, just try to integrate the data while at the same time mitigating as much of the risk of data integrity as possible on a day to day basis.

    I agree that today and even in the past MDM and EDW should have been thought of as they are today, an integral part of doing business, but since they have not traditionally been thought of that way, we are playing catch up and it seems we are losing the battles still. This takes alot of effort and time, while still trying to keep your businesses running, and like I said earlier it takes buy in from the top down to be successful. Thanks for listening….

    • John Schmidt says:

      Terry, I think your main point is that it’s hard work achieving one version of the truth. I agree. Thanks for your comment.

  4. RIchard says:

    Hi John,

    I agree that there is a myth around ‘single version of truth’. For in my opinion, there are very few single versions. You have fundamental differences between how one part of the business measures things versus another. For example, a finance department will focus on ledger movements, yet the marketing may focus on sales funnel activities. To measure both will require different approaches that may not match due to the dates selection criteria and how value is calculated. Yet they both look at the same business transactions.

    So being able to describe how data is aggregated and rolled into information is absolutely key. If you can define fundamentally how different measures work and their purpose, it is then possible to have different versions of the truth. But everyone understands why they are different, and can differentiate.

    Even a single view of a customer may be impossible to achieve, if your business deals with standard retail and corporate customers. For their customer contact information will be fundamentally different from each other.

    So, as they say in France, “Vive la difference!”

    If we can govern it and measure it, we should be able to deal with complication – as long as it is beneficial to the company to have differing views of it’s business.



Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>