Four Canonical Techniques That Really Work (Or Not)

Several years ago I had the fortunate opportunity to participate in a post-mortem study of a $100 million dollar project failure. No one likes to be associated with a project failure, but in this case it was fortunate since the size of the write-off was large enough that it forced the team to take a very hard look at root causes and not just do a cursory analysis. As a result we finally got to the heart of a challenge that has been plaguing data architects and designers for 20 years – how to effectively use canonical data models.

On paper, the concept of a canonical data model looks straightforward. Wikipedia defines it as:

A design pattern used to communicate between different data formats. A form of Enterprise Application Integration, it is intended to reduce costs and standardize on agreed data definitions associated with integrating business systems.

All large organizations have many applications which were developed based on different data models and data definitions, yet they all need to share data. So the basic argument for a canonical data model is “let’s use a common language when exchanging information between systems.” In its simplest form, if system A wants to send data to system B, it translates the data into an agreed-upon standard syntax (canonical format) without regard to the data structures, syntax or protocol of system B.  When B receives the data, it translates the canonical format into its internal structures and format and everything is good.

The benefit of this approach is that system A and B are now decoupled; if A needs to be upgraded, enhanced or replaced, as long as it continues to reformat the data for external systems into canonical form, then B or any other system is not impacted. Simple and elegant!  Under this approach, organizations with hundreds of application systems can now happily upgrade and enhance individual systems without the complexity (and cost/risk) of coordinating the analysis, design and testing with all the other systems.

However, there is a catch.  While A and B are no longer coupled to each other, they are now coupled to the canonical physical format.  So now if the canonical model changes, every single system that uses it to send or receive information is potentially impacted. As a result, the propagation cost, which I wrote about in Release Management Is Waste, approaches 100%. The resulting endless analysis and discussions among business analyst, architects and designers of any proposed change to the canonical model was one of the main reasons why the $100 Million project failed – it essentially paralyzed the project team.

There were other implications on the project from the use of canonical techniques including poor performance and inability to meet required data delivery service levels (due to double translations), information loss, and orchestration complexity to name a few.

The good news is that I gained some powerful insights from the post-mortem analysis – much of it ended up in the Lean Integration book starting on page 343. In a nutshell, I identified four different canonical techniques.  Each technique, when used properly, addresses specific challenges and is extremely powerful and valuable, even essential, in a large-scale enterprise application integration context.  But if used incorrectly or in the wrong context, the techniques result in total failure. The four techniques are:

This blog posting is already long enough, so I promise to post more articles in the coming weeks with additional details on the pros and cons of each approach, and best practices to maximize the benefits of each. In the meantime, what has been your experience?  Have you seen any failures – or have all your experiences with canonical models been positive?

This entry was posted in B2B, Data Integration, Data Services, Data Warehousing, Integration Competency Centers, Master Data Management and tagged , , , , , , , . Bookmark the permalink.

3 Responses to Four Canonical Techniques That Really Work (Or Not)

  1. David Vancina says:

    I was intrigued by the comment, “The resulting endless analysis and discussions among business analyst, architects and designers of any proposed change to the canonical model was one of the main reasons why the $100 Million project failed – it essentially paralyzed the project team.”

    Were the problems truly intractable? To what extent was this a management failure, as opposed to an architecture failure?

  2. John Schmidt says:

    One of my favorite architecture papers is the Big Ball of Mud written in 2000 by Brian Foote and Joseph Yoder of the University of Illinois. It is a seminal paper that questions why architectural anti-patterns keep recurring. One section of the paper explores the concept of “shearing layers” – the notion that any complex object (like an application or the enterprise system-of-systems) consists of a number of layers of components each of which have different rates of change. A building structure for example changes at a different rate than the “skin” of the building or of the stuff in it (furniture, carpets, etc.) or even the services that the building provides. The different rates of change are one of the key factors that end up defining different roles and professions (architects vs. interior designers vs. operations managers).

    I take the position that this was an architectural failure and not a management failure for two reasons; (1) canonical physical objects are only manageable if they are relatively stable, and (2) the canonical model was operationalized. The architects should have recognized that a large complex canonical model is not a stable object – it is constantly changing due to business, technology, and regulatory changes. Business processes in particular are constantly changing and since the meaning of data is impacted by the process context, it is not stable. The larger the business object, the more frequently it will change.

    Second, since the architects imposed the canonical format on physical objects, they pulled the operations staff and organization into the equation. Operations staff are the people that carry pagers – they get called when something breaks in production – so they are very careful and want to make sure than any change to production systems won’t break something. Therefore they demand (rightly so) a very detailed impact analysis before any change is made to the production systems.

    In short, the architects should not have tried to operationalize a canonical model that is not stable. As I mentioned in the article, canonical models decouple systems from each other – but all systems are coupled to the canonical. When the model changes and operations might be impacted, the analysis turns into paralysis.

  3. Pingback: Big Ball of Mud « Another Word For It

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>