Four Canonical Techniques That Really Work (Or Not)

Several years ago I had the fortunate opportunity to participate in a post-mortem study of a $100 million dollar project failure. No one likes to be associated with a project failure, but in this case it was fortunate since the size of the write-off was large enough that it forced the team to take a very hard look at root causes and not just do a cursory analysis. As a result we finally got to the heart of a challenge that has been plaguing data architects and designers for 20 years – how to effectively use canonical data models.

On paper, the concept of a canonical data model looks straightforward. Wikipedia defines it as:

A design pattern used to communicate between different data formats. A form of Enterprise Application Integration, it is intended to reduce costs and standardize on agreed data definitions associated with integrating business systems.

All large organizations have many applications which were developed based on different data models and data definitions, yet they all need to share data. So the basic argument for a canonical data model is “let’s use a common language when exchanging information between systems.” In its simplest form, if system A wants to send data to system B, it translates the data into an agreed-upon standard syntax (canonical format) without regard to the data structures, syntax or protocol of system B.  When B receives the data, it translates the canonical format into its internal structures and format and everything is good.

The benefit of this approach is that system A and B are now decoupled; if A needs to be upgraded, enhanced or replaced, as long as it continues to reformat the data for external systems into canonical form, then B or any other system is not impacted. Simple and elegant!  Under this approach, organizations with hundreds of application systems can now happily upgrade and enhance individual systems without the complexity (and cost/risk) of coordinating the analysis, design and testing with all the other systems.

However, there is a catch.  While A and B are no longer coupled to each other, they are now coupled to the canonical physical format.  So now if the canonical model changes, every single system that uses it to send or receive information is potentially impacted. As a result, the propagation cost, which I wrote about in Release Management Is Waste, approaches 100%. The resulting endless analysis and discussions among business analyst, architects and designers of any proposed change to the canonical model was one of the main reasons why the $100 Million project failed – it essentially paralyzed the project team.

There were other implications on the project from the use of canonical techniques including poor performance and inability to meet required data delivery service levels (due to double translations), information loss, and orchestration complexity to name a few.

The good news is that I gained some powerful insights from the post-mortem analysis – much of it ended up in the Lean Integration book starting on page 343. In a nutshell, I identified four different canonical techniques.  Each technique, when used properly, addresses specific challenges and is extremely powerful and valuable, even essential, in a large-scale enterprise application integration context.  But if used incorrectly or in the wrong context, the techniques result in total failure. The four techniques are:

This blog posting is already long enough, so I promise to post more articles in the coming weeks with additional details on the pros and cons of each approach, and best practices to maximize the benefits of each. In the meantime, what has been your experience?  Have you seen any failures – or have all your experiences with canonical models been positive?

This entry was posted in B2B, Data Integration, Data Services, Data Warehousing, Integration Competency Centers, Master Data Management and tagged , , , , , , , . Bookmark the permalink.

2 Responses to Four Canonical Techniques That Really Work (Or Not)

  1. Rakesh says:

    Nicely explained it, waiting for next blogs.

  2. Your blog is very helpful for those venturing into defining canonical data models. In one scenario that we implemented in 2012 and was fiound to be useful is the following:

    The organization had 4 ERPs – 1 SAP instance and 3 JDEdwards instance one for each Japan, China and Rest of the World (ROW). They had ~50 warehouses globally located. There was a thought to merge all ERPs into one, but did not know when it will happen. So we adopted canonical data model (basically Oracle EBOs). All enterprise services/events were published using the EBOs. Yes, it required transformations at 2 levels and a very strict governance process and centrallized mappings. Performance was not an issue due to the powerful computing resources. But it provided the flexility of ERPs to change, without impacting data transfer and mappings to 50 warehouse interfraces.. This in one situation where canonical data model helped. I do not think the same approach would hold good, if we are sure of having only one ERP in the organization. Best approach probably would be to identify major application that interacts with 2 or more applications and adopt that major application data model as the canonical form. Exception to this however is if an aplication’s data model is too cryptic, in which case cannot be adopted as canonical.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>