For too long, many enterprises have been attempting to sort through increasingly complex spaghetti architectures with point-to-point data integration. “They get to the point where when they want to introduce a new product or make a change, they have to touch 30 different systems,” says John Akred, data and platforms lead at Accenture Technology Labs. “That has real consequences in the marketplace for enterprises.”
John continued that Hadoop – an open-source software framework that enables applications to run across large arrays of nodes, accessing petabytes’ worth of data – will help organizations manage and scale up to the huge volumes of unstructured and semi-structured data now surging into organizations. I recently had the opportunity to join John, along with Julianna DeLua, Enterprise Solution Evangelist for Big Data from Informatica, for a discussion of Hadoop’s role in the emerging data as a platform paradigm. The session was the second session of the Hadoop Tuesdays Webinar series, sponsored by Informatica and Cloudera.
“Hadoop is a fantastic tool,” Akred said. “It gives us the ability to integrate whole new ranges of data into our worldview for an enterprise.” Until recently, he said, data has been stored and organized around individual applications.
A service oriented architecture is the best way to leverage the power of Hadoop, John continued. “We take the data layer, and take data stores like Hadoop, tools like Informatica, and existing enterprise systems, and integrate those at the data layer. We abstract that into an integrated data platform via services.” Once data services are in place, “Hadoop gives you tremendous processing power,” he says. This will enable organizations to not only open up enterprise silos, but also enable the sharing of information with customers and partners outside the enterprise walls. “Hadoop plus some other enterprise systems allow you to do that very efficiently,” John says.
An example of where Hadoop and data services is making a difference is in the utility industry, in which new “smart grid” initiatives is resulting in a data explosion, John explains. “Smart grid introduces a large range of digital sensors to the distribution grid,” he says. “These may be in the form of time-stamped numeric readings, those kinds of things send a message every 15 minutes or so around usage.” Other systems, including CRM and enterprise resource planning systems, also contain valuable data, he adds.
Julianna cautioned that effectively managing Hadoop still requires a robust integration and ETL infrastructure. Otherwise, Hadoop environments will consume the time and resources of data and development departments.
In the third Hadoop Tuesday Webcast (Tuesday, October 18th), Matt Aslett of The 451 Group discussed components and the rise of the Hadoop ecosystem. (Details to come!)
This coming Hadoop Tuesday, October 25th, will feature David Menninger of Ventana Research, who will discuss user adoption patterns. We will also be joined by Binh Tran, chief technology officer and co-founder of Klout, to learn how the growing online service is putting Hadoop to use to serve its base of millions of users.
Additional sessions will feature Omer Trajman of Cloudera (November 15), David Linthicum of Blue Mountain Labs (November 29), Charles Zedlewski of Cloudera and Wei Zheng of Informatica (December 13). Executives from companies that have already implemented Hadoop within their data operations will also be joining us.