Data Integration Patterns Learn the Basics

integration patterns
The Basics of Data Integration Patterns

Integration patterns

I often do the Data Integration 101 talk.  It covers the basics of data integration, and tells people how to get started.  The cloud, IoT, and big data seem to be driving renewed interest in this topic.  More and more enterprises are in the process of reviewing their data integration solutions, as well as the provided solution patterns, to capitalize on new cloud, IoT, and big data opportunities.

So, what are the basic data integration patterns to consider?  Here are a few you can add to your vernacular:

Data replication

This is the basic process of replicating the data from one system or data store to another.  This could occur in an ETL/batch-type movement of data, or many pieces of data could move at once.  Or, data could move a small chunk at a time, typically a row or record.

Transformation is also a part of this process.  You need to change the semantics of the data, even the structure, and there needs to be a mechanism to make that change in the integration engine that’s moving the data.  This could be a simple mapping, such as F_name(char 20) now is First_Name(char 15), or a more complex transformation that occurs when the data moves in batch, or a record at a time.

Data replication is one of the oldest data integration patterns, and was first used in the early days of networking.  However, it’s still the most popular way that data moves from place to place, cloud to cloud, database to database.

Data virtualization

This is an emerging pattern, but one that’s based on data abstraction and other ways to view data that is different from the physical structure of the data.  The idea is to take several data stores, which can be structured or unstructured, and virtually remap them so they appear to have an entirely different data structure.

Using this pattern, data can be combined from many different physical databases into one logical database.  Or, data can just be remapped in ways that allow the data to provide a better set of structures for other uses.  For instance, we could take an operational data store designed for sales transactions, and leverage data virtualization to abstract and structure the data into something that can work better with data analytics.

Of course, there are many derivatives of these patterns, so it’s not just what they are in the wide, but what they are in the narrow as well.  You need to make sure to take into account your requirements first, then back the patterns and sub-patterns into the solution.

Part of this process involves picking out the right technologies for your requirements.  Some may do data replication, others will do data replication, and yet others will do both.  Once you figure out what problems you want to solve, then you can pick out the technology that’s right.  You want to make sure the solutions follow the patterns.  That’s how you win this game.

Comments

  • Fraschi

    Nicely done summary David. Though I must say that I disagree with the idea of changing the semantics anywhere in the process. With this I am refering to the “businesss content” of data.
    Changing this would be a breach of some of the main best practices of data governance and lineage. It is already difficult to establish a common understanding of what the data actually mean and how to use it. Changing semantics would make such an alignment impossible.