The Pros and Cons: Data Integration from the Bottom-Up and the Top-Down

Data Integration from the Bottom-Up and the Top-Down
Data Integration from the Bottom-Up and the Top-Down

What are the first steps of a data integration project?  Most are at a loss.  There are several ways to approach data integration, and your approach depends largely upon the size and complexity of your problem domain.

With that said, the basic approaches to consider are from the top-down, or the bottom-up.  You can be successful with either approach.  However, there are certain efficiencies you’ll gain with a specific choice, and it could significantly reduce the risk and cost.  Let’s explore the pros and cons of each approach.

Top-Down

Approaching data integration from the top-down means moving from the high level integration flows, down to the data semantics.  Thus, you an approach, perhaps even a tool-set (using requirements), and then define the flows that are decomposed down to the raw data.

The advantages of this approach include:

The ability to spend time defining the higher levels of abstraction without being limited by the underlying integration details.  This typically means that those charged with designing the integration flows are more concerned with how they have to deal with the underlying source and target, and this approach means that they don’t have to deal with that issue until later, as they break down the flows.

The disadvantages of this approach include:

The data integration architect does not consider the specific needs of the source or target systems, in many instances, and thus some rework around the higher level flows may have to occur later.  That causes inefficiencies, and could add risk and cost to the final design and implementation.

Bottom-Up

For the most part, this is the approach that most choose for data integration.  Indeed, I use this approach about 75 percent of the time.  The process is to start from the native data in the sources and targets, and work your way up to the integration flows.  This typically means that those charged with designing the integration flows are more concerned with the underlying data semantic mediation than the flows.

The advantages of this approach include:

It’s typically a more natural and traditional way of approaching data integration.  Called “data-driven” integration design in many circles, this initially deals with the details, so by the time you get up to the integration flows there are few surprises, and there’s not much rework to be done.  It’s a bit less risky and less expensive, in most cases.

The disadvantages of this approach include:

Starting with the details means that you could get so involved in the details that you miss the larger picture, and the end state of your architecture appears to be poorly planned, when all is said and done.  Of course, that depends on the types of data integration problems you’re looking to solve.

No matter which approach you leverage, with some planning and some strategic thinking, you’ll be fine.  However, there are different paths to the same destination, and some paths are longer and less efficient than others.  As you pick an approach, learn as you go, and adjust as needed.

Comments

  • Nice post. The major benefit of data integration is that it minimizes inconsistent data: ” If you collect image, text and video data from a multitude of users, you are sure to have inconsistent data. The files have different names, formats, specifications and other inconsistent factors. This can lead to a lot of problems, and it can become very difficult to go through this data and get exactly what you need. Even professional IT workers find this hard, meaning you cannot properly use the mass of data you have collected. You can find more info here: https://www.windsor.ai/