To continue from my prior blog article on this topic, loose coupling between applications in an enterprise portfolio is an IT architect’s dream. If two or more applications are tightly coupled, then it becomes impossible to change or enhance one without impacting the other. Loosely coupled applications on the other hand can be enhanced independently with little or no impact on other systems. The net result is the ability to rapidly change the IT portfolio in response to business opportunities. In short, organizational agility becomes a competitive weapon. But is this dream achievable or is it only wishful thinking?
Just because something is a best practice, that doesn’t mean it actually works. Blood-letting was a best practice in the 19th century until the medical practice developed a more accurate model of how the human body worked. Canonical data models fall into the same camp. Canonicals are a decades-old concept in IT, but the associated best practices are only recently coming of age to the point where they are effective.
The canonical concept is simple and is in contrast to point-to-point integration. The P2P approach directly converts data from the internal format of one custom application to the internal format of another. While this is a quick and simple solution, it has a major drawback in that the two applications are now tightly coupled such that when one of them changes it impacts the other. MacCormack, Rusnack and Baldwin of Harvard Business Review describe this as the propagation cost which is a key factor in the high cost of application maintenance and business change initiatives as the number of integration points increase.
In contrast to P2P, the canonical approach suggests that each system simply translate its internal data into a common format when exchanging information. Under this approach, each application team need only concern itself with understanding two data models rather than dozens – its internal data model and the canonical data model. While this sounds fantastic, once again we have some drawbacks.
First, we need to use the right canonical technique for the job. My prior posting listed four techniques – let me elaborate.
- Canonical Data Modeling, is appropriate when used by enterprise application suite vendors such as SAP or Oracle, or for internal custom built applications such as data warehouses or MDM solutions. The intention of this technique is to eliminate the need for transforming data as it moves around within the confines of the system because it has the same definition everywhere. For example, the data entity “average account balance” has exactly the same definition and meaning in all the data warehouse tables that use it.
- Canonical Interchange Modeling, is a design time technique to facilitate rapid data mapping analysis and design. This technique uses a logical data model and business glossary that is tailored to a specific industry (banking, telecom, insurance, etc.) or business domain (finance, HR, manufacturing, sales, etc.). Data models from operational systems are mapped to the logical data model which helps business analysts and data stewards perform rapid impact analysis (e.g. where are all the systems that store credit card numbers) or to quickly map data from system A to system B since both are mapped to common entities in the canonical model.
- Canonical Physical Formats, achieve extreme loose coupling such as in a B2B context, but only if used properly. The canonical object in this case is a “message” – often in XML format, but any agreed-upon standard including .csv or ASCII flat files will work. The key requirement is that the message definition be relatively stable (changes to the definition happen infrequently), and that the definition include a specific business process context.
- Canonical Business Objects, are effective for passing complex objects in a serialized fashion between different applications. A good example is an invoice object that is passed from the order entry system to the fulfillment system to the delivery system and the invoicing system. The end-to-end process may pass the invoice object (which could contain customer, product, pricing, delivery and billing information) back and forth numerous times as the order changes state or as the customer requests changes. A more modern solution to this is dynamic Data Services as implemented in Informatica’s integration platform.
Second, each of these canonical techniques is yet another “system” which requires staff and resources to maintain. Canonical models are not static un-changing objects and they need to be maintained and evolved with appropriate tools and support. Therefore, we should make the decision to use a canonical technique the same way we make a decision to implement an application system – is there an ROI or financial payback? If the benefits of loose coupling and simplification of the integration infrastructure outweigh the costs of developing and sustaining the canonical technique, then go for it!
 Alan MacCormack, John Rusnak, Carliss Y. Baldwin, Exploring the Duality between Product and Organizational Architectures: A Test of the Mirroring Hypothesis, Harvard Business Review, Working Paper, Version 3.0: October 10th 2008