The Top 3 Data Integration Mistakes, and How to Avoid Them

The Top 3 Data Integration Mistakes, and How to Avoid Them

Data integration is complex for most enterprises, but it doesn’t need to be.  Even though data integration is a pretty well understood IT discipline, there seems to be a lot of confusion out there, which leads to lots of mistakes being made.  Many of these mistakes are very avoidable.

To that point, we need to consider that patterns of failure in order to understand the patterns that will lead to success.  Not to dwell on the negative, but understand that the mistakes we or others make can provide the best lessons.

So, here are the three most common mistakes that I see out there.  I’ll also tell you how to avoid them:

Mistake 1: Fail to understand the types of data you will integrate.

While this seems like something that’s obvious, most major data integration mistakes can be traced back to failures around understanding what data exists in the source and target systems.  There could be data stored in block storage, data objects, traditional relational systems, and even data stores that are proprietary.

Data should be defined in terms of physical storage as well as structure, or lack of structure, if that’s the case.  From there, determine what approach is best for data integration, including transformation and translation of the data in flight, as well as if structure needs to be applied before the data is consumed into the data integration engine.

Mistake 2: Fail to consider performance.

The assumption is that data integration technology has no latency.  That’s never the case.  If you consume a great deal of data from many source systems, the processes on that data in flight really determine the performance of the data integration solution.  If the processing is I/O-intensive or complex, things will be slow.  If there is little processing, then things will speed up.

The only way to deal with performance is to understand the target integration technology, as well as the use cases that you plan to leverage.  Not understanding those pieces means performance is difficult to predict, and you could end up failing just because the solution is too slow during production.  This is a difficult problem to solve after the fact.

Mistake 3: Forget about security and governance.

Security should not be an afterthought.  Indeed, it should be systemic to the data integration solution.  This includes identity and access management, as well as encryption.  We need to deal with compliance issues as well.  There are often many laws that determine how data should be handled.

Governance, especially data governance, is important as well.  Just as we put forth the notion that you need to understand your data, you also need to make sure that you control how your data changes over time, as well as restrict who can change and access the data using policies.

These three mistakes are avoidable. However, they are also common.  As we move forward with data integration technology, we need to consider how things can go wrong, as well as how they can go right.  Studying history is sometimes the most productive path.