If You Have A Data Integration Round Hole, Don’t Buy Square Pegs

I’m often taken aback by the focus on technology as “the solution,” and not as an approach to the solution. One of the reasons I blog for Informatica is because they do focus on the solution and not the tool. Believe me, vendors will sell plenty of tools if they provide those tools in the context of the solution.

So, how do you find the right data integration tool? It’s really a matter of understanding your own data integration requirements, and creating baseline models for what the existing “as is” state is. This means you must understand your data at the structure and model levels, or, more simply put, you must understand what you have, where you have it, and what it’s doing.

There are a few fundamentals of this process, including artifacts you need to create. This includes the data catalog, which describes each data element and its purpose. Typically data catalogs already exist, and if they are active, they are updated automatically. However, most are passive and usually out of data. Using a data catalog goes to having a semantic understanding of your data, which is critical to an effective data integration project.

Next is the schema, which defines how the data is structured and configured, and how the data elements relate to one another. But, of course there is more. You also need to understand performance requirements, data quality requirements, data governance issues, and compliance, just to name a few.

While all of this seems like a lot of work, typically it’s not, and it’s time well spent considering where we are heading…the target data integration architecture and the selection of the tool. If you get your baseline requirements right, the solution is a mere logical conclusion.

Selecting the technology is a matter of understanding the solutions’ patterns required to meet your data integration requirements. There are a few core meta-patterns to consider here, including semantic mediation, data quality, integrity, and the usage scenarios such as BI and operational data. Keep in mind that you may deploy multiple integration technologies within the same problem domain, and that should not be a problem if they are the proper tools.

A larger issue is that many don’t consider leveraging tools that support good data quality assurance mechanisms, which, in some cases, require a staging area. Instead they opt for tools that have the “ESB” label and thus typically only support ESB patterns, in essence, simple messaging systems with services interfaces. That’s a square data integration peg, when most problem domains have round holes.

In addition, your tool should support the transfer and cleansing of a large amount of data from source to target. This is another “peg” issue to consider: Many start with a minimal amount of data transfer load, and then find out they picked the wrong tool when the data loading increases, which such loading always does.

The path to success here is to plan and to test. You have a bunch of choices out there, and the reality is that most are the wrong choices.

This entry was posted in Cloud Computing, Data Aggregation, SOA and tagged , , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>