Several years ago I had the fortunate opportunity to participate in a post-mortem study of a $100 million dollar project failure. No one likes to be associated with a project failure, but in this case it was fortunate since the size of the write-off was large enough that it forced the team to take a very hard look at root causes and not just do a cursory analysis. As a result we finally got to the heart of a challenge that has been plaguing data architects and designers for 20 years – how to effectively use canonical data models.
On paper, the concept of a canonical data model looks straightforward. Wikipedia defines it as:
A design pattern used to communicate between different data formats. A form of Enterprise Application Integration, it is intended to reduce costs and standardize on agreed data definitions associated with integrating business systems.
All large organizations have many applications which were developed based on different data models and data definitions, yet they all need to share data. So the basic argument for a canonical data model is “let’s use a common language when exchanging information between systems.” In its simplest form, if system A wants to send data to system B, it translates the data into an agreed-upon standard syntax (canonical format) without regard to the data structures, syntax or protocol of system B. When B receives the data, it translates the canonical format into its internal structures and format and everything is good.
The benefit of this approach is that system A and B are now decoupled; if A needs to be upgraded, enhanced or replaced, as long as it continues to reformat the data for external systems into canonical form, then B or any other system is not impacted. Simple and elegant! Under this approach, organizations with hundreds of application systems can now happily upgrade and enhance individual systems without the complexity (and cost/risk) of coordinating the analysis, design and testing with all the other systems.
However, there is a catch. While A and B are no longer coupled to each other, they are now coupled to the canonical physical format. So now if the canonical model changes, every single system that uses it to send or receive information is potentially impacted. As a result, the propagation cost, which I wrote about in Release Management Is Waste, approaches 100%. The resulting endless analysis and discussions among business analyst, architects and designers of any proposed change to the canonical model was one of the main reasons why the $100 Million project failed – it essentially paralyzed the project team.
There were other implications on the project from the use of canonical techniques including poor performance and inability to meet required data delivery service levels (due to double translations), information loss, and orchestration complexity to name a few.
The good news is that I gained some powerful insights from the post-mortem analysis – much of it ended up in the Lean Integration book starting on page 343. In a nutshell, I identified four different canonical techniques. Each technique, when used properly, addresses specific challenges and is extremely powerful and valuable, even essential, in a large-scale enterprise application integration context. But if used incorrectly or in the wrong context, the techniques result in total failure. The four techniques are:
- Canonical Data Modeling, for custom built applications, data warehouses or MDM solutions
- Canonical Interchange Modeling, for lean and agile data mapping analysis and design
- Canonical Physical Formats, for extreme loose coupling such as in a B2B context
- Canonical Business Objects, for dynamic Data Services.
This blog posting is already long enough, so I promise to post more articles in the coming weeks with additional details on the pros and cons of each approach, and best practices to maximize the benefits of each. In the meantime, what has been your experience? Have you seen any failures – or have all your experiences with canonical models been positive?