The first time I heard about Hadoop was several years ago. I was at a local Teradata user group meeting for in San Francisco. This is where many of the technology and other data-centric companies in Silicon Valley go to socialize and share ideas. One of our joint customers in the insurance industry was asking Teradata’s Dan Graham, a veteran of the database industry and now the General Manager of Enterprise Systems: “So can you tell me more about Hadoop? How do we make sense of this? Is this a replacement to traditional database systems? Or is this a technology that augments what we have invested over the past few years?”
At that time, many thought Hadoop was just a Silicon Valley phenomenon, too early for wider adoption. We discussed how Web-driven companies like Yahoo! and Google contributed to Hadoop and promised to continue our dialogs. A small minority of companies used Hadoop primarily as their big data processing technology. The vast majority were investing in all types of database technologies, including appliances that were going through their own evolution. I pulled Dan aside and asked: “So what is the scoop about Hadoop? Is this a technology that we need to pay attention to?”
He told me, “Hadoop is currently at the maturity level of Joe Montana when he was seven-eight years old. It has the attributes of top athletes when they are developing.” I decided to pay attention to this technology and see how it would evolve over the next several years.
Now, fast forward to 2011. More organizations are incorporating Hadoop into their IT infrastructures to perform analytics that were not feasible (or at least cost-effective) before and storing much more data ─so that instead of discarding data now they can use a greater set of data for mining and other purposes. Hadoop with its sub-projects and related components has been experiencing a quick pace of evolution supported by its open source community. At the same time, we are furthering our understanding of Hadoop with its strengths and areas that are still evolving based on real-world deployments. It is important to note here that, during the same period, organizations have become far more data-centric so we have multiple moving targets of new business requirements and technological evolutions from Hadoop and other data processing platforms.
One thing is clear, Hadoop in the enterprise is additive (not a replacement) to other data technologies for a very large majority of organizations because they must take advantage of the best of both worlds, namely, Hadoop and the rest of data infrastructures to tackle big data.
Hadoop is growing and will continue its evolution. What will Hadoop be when it grows up?
Informatica has teamed up with Cloudera to host a 7-part Webinar series-Hadoop Tuesdays, to share our thoughts with a panel of experts on Hadoop and help you create an IT roadmap that incorporates Hadoop.