Data Integration must be Assumed, Far-reaching, and Systemic – An AWS re:Invent recap by David Linthicum

news at re:invent
Data Integration must be Assumed, Far-reaching, and Systemic.

At Amazon re:Invent, which was held last week in Las Vegas, Werner Vogels, Amazon’s Chief Technology Officer, outlined a vision for a modern data architecture that covered data ingestion and data governance. This vision included the announcement of Amazon Glue.

While these services are additional tools in the shed, what struck me the most as I saw news coming out of re:Invent, is that those who are relocating application and data stores to the cloud, assume that data integration will be there, will work everywhere, and be systemic to everything.

These assumptions are typically foiled when assumptions meet the real world. Integration is neither far-reaching and/or systemic unless you make it so. The burden is on you, not your public cloud provider, and you can assume nothing as you migrate applications and data to the public cloud.

There are two basic approaches that I’m seeing emerge as enterprises move to the public cloud. One, is the effort to accomplish all data integration activities, no matter if it’s ETL or real-time, using the native cloud-based tools, such as those provided by AWS, Google, and Microsoft.

The problem with this approach is that there is never a single-cloud solution. We typically implement multi-cloud, which means that a single-cloud native data integration solution won’t get you very far. While working and playing well with resources that exist on the native cloud, this approach won’t work and play well with other cloud-based resources, services, and workloads.

So, when we’re moving to the cloud, data integration cannot be assumed, and the cloud-native data integration technology is not all far-reaching. So, what’s an enterprise to do?

It’s simple really. It’s a matter of extending your existing enterprise-based data integration technology to your cloud-based workloads, working from within the enterprise to without.

What remains a requirement is that the data integration solution of choice has a cloud-based analog. Or, a version of the data integration technology that runs in the clouds, sometimes many clouds, as well as provides on-premises-based implementations.

This approach does a few things:

First, it’s enterprise centric, or can be enterprise centric. You don’t always have to join cloud-based workloads to other cloud-based workloads, leveraging only the cloud-based data integration tools. While you can take that approach as an option, it’s not forced upon you.

Second, you’re not limited by data integration boundaries. For instance, there is not a requirement that you limit your data integration solution to what exists in the cloud, or limit the number of on-premises systems that it will communicate with.

Data integration becomes the single most limiting factor for workloads that have already moved to the public cloud. While security, compliance, and governance were obstacles to getting there, data integration, which was once assumed to be there, is not. Enterprises will scramble to find solution when assumptions meet reality, when just a bit of planning could have avoided all of this. Choose your data integration technology wisely.