Remember Your Data Integration Fundamentals
Those who have been put in charge of a data integration project are often unsure about where to start, and what to do. Although data integration is an aging science, we’ve recently added more approaches and technologies to the stack. So, if you thought you knew data integration from years back, you need to do a bit of relearning.
For this blog, I’m going to put together some data integration fundamentals that you can use as a starting point for your own data integration project. These fundamentals are not just older stuff, but really state-of-the-art, in how we should approach data integration today. Here they are:
Metadata is everything. This means that the data about the data will drive the way you approach the integration of the data. You need to understand the information contained within the source and target systems, as to their true meaning. Find a single source of truth about the data. For instance, what’s the single source of truth for customer information, that is pretty much stored all over the enterprise? Same question applies to sales data, and inventory data.
Flows make it go. Once we understand the meaning of the data, including the single source of truth, it’s time to figure out how the data will flow from system to system. While most data integration flows are simple replication, there is also the process of changing the structure and the content of the data flowing from system to system so that the target system receives native data.
This is not such a big deal if you’re replicating data between two relational databases, but these days we move data between SQL and non-SQL systems. Or, between structured and unstructured data stores. The data integration engine must deal with the complexity of accommodating the ways that the enterprise stores data, and, in essence, place these complexities in their own domain.
Think security and governance. Two tenants of data integration that are not as well understood are security and governance, as related to data integration. This issue becomes more important as we move to the cloud, since the data is often physically out of our control. Those charged with data integration may decide to encrypt the data, either in flight (as it moves from place to place), or at rest (as it’s stored), or both. Once encrypted, the information is relatively safe.
Governance, in the context of data integration, means that we place active policies around the use of the data, data flows, transformations, etc., and thus we’re able to control and track how the entities are leveraged. This allows us to avoid having somebody change a flow, or change the structure of a target system, and break the data integration solution. All dependencies are tracked, and you must have permission to make changes.
No matter if this is your first data integration project, or you’re fiftieth, the fundamentals of data integration continue to be important to your success.