Tag Archives: downstream
We have been looking at how data management issues can be classified, and in my last post I provided five categories, but broken them down into two groups: Systemic and System. The systemic issues are ones in which process or management gaps allow data flaws to be introduced. A good example occurs when consumers of reports from the data warehouse insist that the data sets are incomplete, and the root cause is that the processes in which the data is initially collected or created do not comply with the downstream requirement for capturing the missing values. (more…)
Putting an MDM Hub in place can help make system integration easier and less costly. The reason behind this is simple: a system integration strategy that relies on a single data model, to which all the systems involved refer, is much less complex than one that is based on point-to-point integration. The Hub becomes a sort of Rosetta Stone, and the data model on which it’s based is the target of the system integration. Getting data into the hub – transforming it, cleansing it, standardizing it – is a fundamental part of any Siperian implementation. Getting the data out of the hub and into downstream systems is equally important in order for them to benefit from the corrected information and to allow business users to leverage the data operationally.
The biggest hurdle to overcome in getting golden copy data into the systems that require it is authoring and maintaining the transformation definitions. Regardless of transport mechanism – batch, JMS, SOAP, etc – the data will have to be transformed so that the format and values are usable by the receiving system. If the number of receiving systems is large and heterogeneous – and in some cases there are hundreds or even a few thousand such systems – then creating the transformation definitions and keeping them up-to-date is a substantial task. While there is no silver bullet that will magically solve the problem with a single click, there are some tools and techniques that can help decrease the effort and cost required:
Use a Transformation Discovery Tool: These tools work by first profiling the source and the target data stores (which means you have to have some data in your Hub, as well as in the destination). After profiling, they look for what are called “binding conditions”. A binding condition ties together a set of attributes in the source system with a set of attributes in the target system, with a high probability of representing the same information. Once a binding condition is defined, the tool determines the logic that will transform the source data into the destination. The output varies with the tool, but is usually either expressed in pseudo-code, or in SQL. When the number of downstream systems is high, and especially if the data model is complex, using a tool to help define the “guts” of the transformations can save a tremendous amount of time and money, when compared to doing it by hand.
Have a Single Transformation Platform: This may seem obvious, but a surprising number of system integration efforts end up being implemented piecemeal – each receiving system implements its own set of adapters in whatever language and using whatever application-specific tools it has on hand. This has the extremely undesirable effect of scattering the transformation logic throughout an organization, which makes maintenance and management of the integration a nightmare. To avoid this, keep all of the transformation in a single platform, with a single authoring environment, and preferably a single execution environment. Not only will this greatly decrease the complexity and cost of maintaining the downstream data syndication, it will also provide the possibility of reusing transformation and validation logic whenever possible.
Once the golden copy arrives at its downstream destination, the operational system can leverage it through the normal application interfaces and processes. There are some who might ask if it is appropriate to update the downstream records, or if alternative business processes or interfaces should be used directly against the Hub itself. All of these are good questions and once again speak to the criticality of having a Hub that can provide these options, if and when business needs dictate. Transformation is a continuous process, not only for your data, but for your business as well.