We have all heard of data federation and of late we have also been hearing how simple, traditional data federation often gets passed off as data virtualization. Let’s get back to basics and take a hard look at what the real need is.
Data federation is not a new concept. When it first arrived on the scene many years ago, technologists got excited as it offered a way to quickly access numerous disparate data sources without physically moving data. Years passed and the term kept appearing in research paper after research paper – but what did not happen was the anticipated widespread adoption. TDWI’s Wayne Eckerson does a great job at tracking the evolution of data federation in his recent webinar and blog. Simple, traditional data federation does one thing and only one thing well – it creates a virtual view across heterogeneous data sources, delivering data in real-time, typically to reporting tools and composite applications. In its very simplicity lay its downfall.
When we consider the two primary usage scenarios for data federation, one must co-exist with and enhance a data warehousing architecture while the other must deliver a data services layer to an SOA. If that is indeed the case, why would data federation technologies offer-up only simple SQL-only or XQuery-only data transformations while rich ETL-like data transformations are expected? Also, how can it assume that the data in the sources it federates from is of good quality? On the other hand, SOA assumes that the data services layer will take care of data access, data profiling, data quality, data transformation and data delivery, and not simplistically wrap data access with a Web service for example, and sign-off as simple traditional data federation does..
Now, in the midst of all this mess, a new kid on the block shows up – data virtualization. Once again, simple, traditional data federation technologies take the road much traveled and decide to quickly rename themselves as data virtualization. Guess what – wrong again! Data federation is missing the point big time. Data virtualization should borrow from the magical world of virtual machines where all the underlying complexity is hidden from the end user while it is still handled elegantly by the technology. In the world of data virtualization, this must translate to letting the business own the data while IT retains control.
Mike Gilpin and Noel Yuhanna from Forrester Research articulated this as “an information fabric presents a business-friendly virtual view of diverse information.” This definitely sounds like next-generation data virtualization, as such a technology must effectively hide the underlying complexity and yet handle the underlying complexity in the same breath. Which means it should present a business-friendly view by doing all the heavy-lifting involved with accessing all types of data, analyzing the data for inconsistencies and inaccuracies, applying not just simple data transformations but rich complex data transformations and delivering the data at any speed whether it is physical data movement or data federation, using any protocol and to any data consumer. In a recent webinar, HealthNow NY described this concept in great detail. David Linthicum also covered this in a recent blog.
What do you think?
Next up – I will discuss the characteristics of next generation data virtualization in greater detail.