To compete on Big Data and analytics, today’s always-on enterprise needs a well-designed evolving high-level architecture that continuously provides trusted data originating from a vast and fast-changing range of sources, often with different formats, and within different contexts.
To meet this challenge, the art and science of data integration is evolving, from duplicative, project-based silos that have consumed organizations’ time and resources to an architectural approach, in which data integration is based on sustainable and repeatable data integration practices – delivering data integration automatically anytime the business requires it.
However, moving to an architectural approach isn’t an overnight hop. Rather, it’s a multi-year, multi-stage journey, through which organizations gradually and successfully marshal their people, energies and technologies to build a highly functioning Next-Gen DI architecture.
There are four distinct stages organizations pass through on this journey. Many organizations are likely still in the first and second stages, and it’s not likely that many have reached the fourth stage as of yet. But in a matter of time, many will:
Stage 1 – Project-Based: This is marked by the total absence of any data integration architecture at all. As the name suggests, a project-based data culture is one in which any and all data integration efforts that are attempted are done so are on a project-by-project basis. These projects are likely very highly siloed affairs, with little to share or pass on to any other projects taking place within the enterprise. Also imagine the drama that takes place when there is an acquisition and merger, in which two or more huge sets of corporate data that may take months or years to finally bring together, if they’re brought together at all.
The result is multiple, unconnected DI platforms within the same organization. Each has its own cache of hand-coded scripts, or even hand-coded connectors from individual applications. In addition, the quality of data delivered may be questionable, since everyone is pulling from his or her own selected data sources, and using his or her own front-end tools, which are usually spreadsheets.
There is an additional characteristic of a project-based data culture. There is a tendency for data professionals to carry around all the knowledge and techniques needed about data management and integration in their heads. Knowledge about projects – what scripts are employed, how the pieces fit together to come up with a result – are not documented or systematized as a process. Inevitably, these professionals will take time off, will get a better offer, or will retire. And with those individuals go everything that is known about all the databases, scripts and applications.
Stage 2 – Application-Centric: As organizations progress to an application-centric data culture, things are more systemized than in the project-based culture, but not by much. Applications are giving rise to a more disciplined approach to data management and integration. An organization may even have a platform, such as an IBM System i computer, in which all apps on that machine will share a common database. In this kind of environment, hand-coded scripts are still prevalent, but some data integration is facilitated through connectors and interfaces to enterprise applications. Most integration scripting may finally be documented, at this stage.
In addition, data warehouses or data marts may begin to spring up around the enterprise, usually initiated by the marketing department, to capture selected datasets for analysis by selected parts of the business. However, there are still many other databases and data sources around the enterprise. And most of this data is still in silos, attached their specific applications.
Stage 3 – Process-Centric: Here, we see some automation of data integration processes is introduced. Many of the remaining hand-coded scripts are replaced by more automation, and are, in many cases, reusable. Data integration also expands to cover multiple disciplines, including data quality and data exchange. At this stage of the evolution, enterprise-wide metadata and master data management is employed against most core systems to work toward a single version of the data. Much historical data is maintained and accessed through data warehouses, and some operational data as well.
Some data professionals may also be working with frameworks such as Hadoop. However, they may be doing it with selected data within their departments, and not necessarily sharing with the enterprise as of yet. It would be well outside of the data warehouse. For example, IT administrators may be using Hadoop to capture and manage weblog data so they can manage web performance better.
Stage 4 – Architecture-Based: At this stage, a “muscle memory” takes over, meaning that most, if not all, data integration for the enterprise takes place on an automatic level, as it is baked into all systems and processes. There is no need for data specialists to scramble when a new requirement comes up from the business – the integration just happens, automatically. Likewise, when a new data source is identified – say there has been a merger, acquisition, or simply a new business partner is brought into the mix – again, the data integration just happens. At the front end, if a new application or user interface starts getting used, it is automatically supported.
An architectural approach enables this to happen. In this state, DI is continuous and available across the enterprise, via standardized and reusable data services and metadata. Most, if not all, interactions are automated. Data sources are added to the flow as quickly as they are needed by the business. And, very importantly, data is trustworthy, deduped and verified from original sources.
At the front end, there’s self-service. Business users can design their own queries and interfaces and access enterprise data with minimal intervention from the IT or data management department. Data is viewed and analyzed on a real-time or right-time basis where required.
At this stage, there is adoption of Lean integration principles, in which data integration is part of a systematized, architecturally designed process – emphasizing repeatability, continuous improvement and quality. For any organization that is facing the Big Data challenge – with terabytes or petabytes of structured, unstructured or semi-structured data moving through the business – data integration needs to addressed as a repeatable and automated process. There simply is no other way.
For more discussion on this evolution, listen to Dr. Claudia Imhoff, John Schmidt and me discuss Architectural Best Practices for Next Generation Data Integration.