Data Integration - Informatica

Informatica Enterprise Data Management

What, Exactly, is 'Data Warehouse 2.0'? Opinions Vary

Joe McKendrick

It seems in recent years pundits and vendors alike have been applying the 2.0 label to everything and anything emerging across the technology plain. In some cases, the new label has stuck - witness the widespread adoption of the terms 'Web 2.0' and its business sibling, 'Enterprise 2.0.'

In some cases, it’s a case of marketecture, but yet, the 2.0 identifier does convey a certain sense of maturity – that a technology is moving to a new stage of sophistication, of engagement with the business and its end users.

There have been moves afoot to identify the next generation of data warehousing as "Data Warehouse 2.0." However, there are differences of opinion as to what exactly will constitute DW 2.0, and thus no clear standard sense of direction in the market.

Some see DW 2.0 in tune with the Web 2.0 phenomena, that is, being delivered as on-demand analytical capabilities delivered via software as a service. Others see DW 2.0 as another step above the data silos of yore and moving in the direction of greater cross-enterprise capabilities. Still, other industry experts view DW 2.0 as supporting more intelligent lifecycle-oriented data stores that can archive and analyze all forms of data, structured and unstructured.

Forrester analyst Jim Kobielus, for one, relates Data Warehouse 2.0 to two emerging phenomena: (1) turnkey, analytic databases or appliances that can be dropped into an operation, and (2) analytic capabilities delivered via the "cloud" as an on-demand service.

Kobielus predicts many companies – short on staff and expertise – will turn to inexpensive analytic horsepower available via subscription-based data warehouse services. "As it becomes available from many service providers, DW 2.0 will offer an ever-expanding supply of inexpensive, plentiful analytic horsepower," Kobielus writes. "Over the coming decade, software-as-a-service (SaaS) providers will begin to offer feature-complete, subscription-based business intelligence/data warehouse services for high-performance, high-volume, complex analytics. These clouds will leverage the full virtualized, distributed, scalable, grid-computing fabric that [vendors] can bring to bear on data mining, performance optimization, and other compute- and data-intensive tasks."

Bill Inmon, the guru of gurus for data warehousing, published a definition of Data Warehousing 2.0 a couple of years back. DW 2.0, as Inmon describes it, includes "many integrated features that were never found in the first generation" of data warehouses. In addition to integrated transaction data, DW 2.0 includes qualified and edited unstructured data in several forms: integrated metadata, including both business metadata and technical metadata, online high performance data that can be updated, reference master data, profile data records, and continuous time span data.

Inmon arrived at his definition of DW 2.0 well before others came along, which he clarified in an interview in Data Management Review a couple of years back. Inmon explained how there is a need for a vision for the future of data warehousing, "which I believe a lot of people in the industry have wrong," he said. "It came from confusion and from vendors trying to sell products. There were people building transactional systems they were calling a data warehouse; people building federated versions of a data warehouse; people building data marts that they were calling a data warehouse. Those are just some of the renditions."

The main distinction between DW 2.0 and DW 1.0 "is that the DW 1.0 never recognized the lifecycle of data within the corporation," Inmon explained. "DW 1.0 said, 'Here's some data.' DW 2.0 says, 'Here's the data; it has a lifecycle, and each of the different portions of the lifecycle have unique characteristics.'" Another major difference, Inmon went on to say, "is the recognition that unstructured data and structured data should both contribute to the data warehouse. There is a wealth of information in the world of unstructured technology, but it has to be built properly for the data warehouse."

Dan Linstedt also delved into the promise of DW 2.0 offering the ability to manage structured and unstructured data and everything in between. He also sees Data Warehouse 2.0 offering the establishment of a common data model that can be applied across the enterprise. As he related in a recent blog post, while enterprise data warehousing is supposed to reduce the incidents of stovepiped BI approaches seen in spreadsheets and spreadmarts, many data warehouse sites may get so complex and far from its original design that end-users end up running back to spreadsheets.

As Dan explains it, the typical "star-schema" model works well and is cost effective for the first series of DW implementations. However, by its fifth or sixth iteration, things get too complex, and the original designs of the DW become lost or distorted, Dan explains. "Too many different kinds of data are added to the dimension 'to conform it to the enterprise' which distorts its original purpose." In fact, if done improperly, "each time IT increases the size of this monster, it always creeps in to higher cost, and longer implementation timeframes."

Thus, agility is lost, and "a simple 'change' that the business has to make (that used to cost $150k and take 90 days) now costs well in to the $350k range and takes six months or more, Linstedt related. What was a conformed dimension now becomes a "deformed" dimension, and has trouble meeting the business needs." As a result, the business users do their own workarounds of this clumsy enterprise beast — which means going back to relying on spreadsheets.

DW 2.0, Dan relates, "comes with the standard definitions that the industry has lacked over the years, finally and at last we have standards, definitions, and frameworks to follow." Plus, an essential piece of DW 2.0, Dan believes, is putting the right data model in place.

And, perhaps with the right data model and the associated rules enterprise-wide, it would make it easier to make a business case. Because now they can see and feel how data relates to business metrics and performance measurements. It’s no longer an abstract technical concept; organizations can actually assess how their strategies are working in operational terms.

No Comments, Comment or Ping

Reply to “What, Exactly, is 'Data Warehouse 2.0'? Opinions Vary”