There is a huge amount of buzz and hype in the market around big data. Words like Hadoop, Cassandra, Hive, NoSQL are frequently thrown around, and it can seem like they are largely detached from most people’s day-to-day reality. Particularly for folks who are doing the heavy lifting of data integration and data management for their organizations, all the buzz can seem like mere noise. I often hear comments such as:
- “We don’t do big data here. Our volumes aren’t that big.”
- “There are some folks playing around with Hadoop in the lab, but that’s about it.”
- “I think this may have potential use for us, but I’m not really sure. There’s too much hype right now. And I’m way too busy to sort through it all.”
Big data: It is your problem…
You may think big data is not relevant to you. But more than likely, big data will become part of your reality over the next few years. Why?
First, three inexorable trends have fundamentally shifted the landscape of computing, and in turn, they are reshaping the world of data.
- Cloud computing is now mainstream, representing 4% of total IT spend, and it’s here to stay. The proliferation of cloud applications is creating the next wave of data fragmentation, with more and more data captured and processed by cloud service providers.
- Social computing has risen out of nowhere just five years ago to now representing unprecedented volumes of interaction data—Tweets, Facebook posts, Zynga games, LinkedIn connections, Yelp! reviews, and more. There’s vast opportunity in doing something with all that data—in fact an estimated 88% of US companies use social media for marketing —but it can be hard to tap.
- Mobile computing has leapt forward in recent years, in no small part due to the iPhone and iPad, as well as supporting players like Android. Experts predict that within the next five years “more users will connect to the Internet over mobile devices than desktop PCs.” All this mobile computing is driving the next surge in device and sensor interaction data, with the opportunity to utilize mobile device data to enable innovative context-aware and location-based services.
Second, big data is not just about large volumes. There are plenty of us who don’t need to deal with petabytes of data. But even where data volume isn’t overwhelming, data variety, velocity and complexity are increasing, due to the above trends. Data is no longer just an orderly set of rows and columns generated by transactional applications—it’s a messy agglomeration of unpredictable data from a huge variety of sources, many of which behave far differently than traditional enterprise data. That’s the true challenge of big data.
There are almost no organizations where some aspect of this isn’t relevant. So even if these factors haven’t hit you personally yet, they will, and soon.
Big data: It is your problem… and your opportunity.
So what does that mean for a data integration or information management professional? It means that change is coming. And change is a double-edged sword. You can use it as an opportunity to do something new and of value—for yourself, and for your organization—or it can pass you by.
Big data has been billed as the “next frontier for innovation, competition and productivity.” Right now, big data often means mining huge volumes of data, such as social media, clickstream or device log data, to find new insights. But it also means detecting interesting events in real-time as they occur— based on web, device, sensor or social media activity— and then taking some action on that event. It means incorporating data from the messy new world with your existing enterprise systems to have a more complete view of your customers and your business. And it can even bring the opportunity to pursue a whole new business model that is based on the value inherent in this new data.
Big data indeed provides a big opportunity. But it can also bring unforeseen costs and risks if not managed properly. The data volumes, variety and complexity can lead to spiraling costs, even with the use of low cost computing platforms such as Hadoop. The development and analysis skills required to utilize big data are scarce and expensive. And most big data projects are not yet paying attention to critical enterprise issues such as manageability, security, availability or governance.
You need to balance the new demands posed by big data with your existing requirements and resources. And you need to deliver the new value expected by the business while mitigating the cost and risk of big data.
We’ve been spending a lot of time talking with folks about these big data issues. We would love to hear more about your experiences with big data. Please take our big data survey— you have a chance to win an iPad (yes, the new one), and we’ll be publishing the report in May. I’m sure there will be some very interesting results in there for us all to discuss.
Also, the R&D experts at the Informatica labs are working on a lot of innovative capabilities to address many of these challenges. Informatica World is the best place to learn about what we’re up to, so please join us in Vegas to hear it all first hand.