There’s a historic parallel for Hadoop’s rapidly growing ecosystem and excitement – the Linux operating system had a similar trajectory more than a decade ago. At that time, as companies embraced the open source system, a vibrant ecosystem of users, vendors and community supporters evolved to move the technology forward and add value.
Now, we see the same thing happening with Big Data, as an impressive ecosystem emerges around Hadoop. “This is a very strong and vibrant and varied community,” Matt Aslett, analyst with the451 Group, pointed out at the recent Hadoop Tuesdays webcast. “It very much reminds us of the early early stages of Linux, where you have vendors and users who each have something to gain from Hadoop being successful.”
I had the opportunity to join Matt, along with Julianna DeLua, Enterprise Solution Evangelist for Big Data from Informatica and producer of the HadoopTuesday Webinar series, for a discussion of the emerging Hadoop ecosystem. The session was the third Webcast in the series, sponsored by Informatica and Cloudera.
The Hadoop project, like its Linux predecessor, brings users and vendors together for a common purpose, with little friction, Matt points out. “I think sometimes when people think about open source, there is an assumption there’s an inherent tension there. That’s not necessarily the case – Hadoop has lots of different organizations from different perspectives working together for their mutual benefit.” Matt says there are three types of vendors offering solutions within the Hadoop ecosystem – supporters/connectors, with products that complement Hadoop: data management and integration providers; and analytics providers seeking to build their solutions on Hadoop platforms.
Matt also provided examples of three prominent organizations employing Hadoop to bring value to their Big Data:
Orbitz: The travel website initially invested in Hadoop as a more cost-effective way to “store and process log data coming from their search files,” Matt said. Orbitz quickly recognized that the data being gathered could be deployed for further analysis. This demonstrates how it complements the existing data warehouse, not replaces it. In almost all projects that we see, it’s about complementing the existing data warehouse.”
AOL Advertising: “They use Hadoop to do the real time data processing for targeted advertising based on clickstreams, cookie data, campaign history and other data,” says Matt. “It’s got to be relevant, and they have a 14-millisecond targeted ad response time. They use Hadoop to do number crunching and analytics to ensure they’re delivering the right ad, based on individual user.” Matt adds that AOL Advertising uses a NoSQL database to process the data. “One of the questions we often get is: ‘how does Hadoop relate to NoSQL databases?’ This is a really a good example of how they can be complementary. Hadoop is there for the number crunching; NoSQL is there for the real-time delivery of the targeted ads.”
USA Search: This organization operates the official search.us.gov site and provides hosted search for government organizations. “They really ran out of headroom in terms of headroom and scalability,” Matt says. Hadoop, with its low costs, enables the various agencies to store any and all data that is relevant. “That really opens up opportunities to new analysis capabilities and new search capabilities in the future.”
In today’s Hadoop Tuesday Webcast (November 15th), Omer Trajman of Cloudera, discussed the best practices of deploying projects in Hadoop. We also discussed how to move from architecture to implementation and we covered typical pitfalls faced when deploying Hadoop in production; architectural approaches; how to design and deploy projects. Wrapping up the series in the final two sessions will be David Linthicum of Blue Mountain Labs (November 29) and Charles Zedlewski of Cloudera and Wei Zheng of Informatica (December 13).