With Big Data systems now in the mix within most enterprises, those charged with data integration are interested in how their world will soon change. Rest assured, most of the patterns of integration that we deal with today will still be around for years to come.
However, there are some clear trends that data integration managers need to understand, such as:
- The ability to imply structure to the data at the time of use.
- The ability to store both structured and unstructured data.
- The need for faster data integration technology.
The ability to imply structure to the data at the time of use refers to the fact that Big Data systems using the Hadoop set of technologies have the ability to add a structure at the time of use. Thus, you don’t need to pre-define a structure as we do in the world of relational data, you can map a structure to existing data.
While this has certain advantages, such as the ability to create dynamic structure around in-line analytical services, this also causes some complexity when dealing with data integration technology. Most data integration technology leverages some type of structure on either end of the integration flow. The idea is that you need to layer a structure as the data is consumed, translated, and produced from one system or data store to another.
The ability to store both structured and unstructured data, as related to the layering in a dynamic structure, brings both complexity and flexibility. Big Data systems are basically file systems with anything and everything stored in them. This means that documents, text, and data are all intermingled. This information may be bound to a structure, or freestanding. In any event, you need to provide the ability to move both structured and unstructured data from store to store.
The need for faster data integration technology is a result of the fact that we deal with much larger volumes of data than more traditional enterprise systems. Therefore, there is more data that has to be moved from data store to data store. Thus, there is a renewed focus on data integration technology’s ability to keep up with the data integration performance requirements.
In many respects, the ability to create a data integration solution that is able to move larger volumes of structured and unstructured data between data stores is dependent upon the way you’ve designed the data integration flows, as much as the data integration technology itself. As Big Data systems move into your enterprise, and you join them together using data integration technology, you’ll find that the patterns of the integration flows need to change as well. Before these systems are put into production, it’s a good idea to review what needs to change and best practices around the design of the integration flows.
Big Data is more of an evolution around the way we store and deal with data. It provides more primitive commodity mechanisms that provide more flexibility and the ability to deal with larger amounts of data using highly distributed data management technology. Data integration technology needs to adapt to this change, which is further reaching than anything we’ve seen of late.