If you’re reading this article, you’re probably interested in big data, but don’t really know what you’re looking for with big data or how you’ll find it. Don’t feel confused. It’s not like traditional analytics, where you know the structure of the data – the relations across sources, the dimensions to build, the calculations to perform – and the reports you need. Big data can be completely unstructured, with no clear relationships. And you don’t know what you’re looking for until you find patterns. A complaint might come in about an online shopping cart being wiped out, which is what happened to my wife with a big toy retailer during some online Christmas shopping. I’ll tell you right now they ended up losing a lot of money. If they’re using big data, they might find a pattern of the steps that made her hit that bug. Then they might search for all customers that had the same problem, get their e-mail addresses or names, and do a recovery campaign. I hope the retailer is using big data properly. My wife would receive a call, and get that order. They’d be happy, and I’d be happy.
But I don’t think it will happen, because it’s not so simple. Finding a big insight isn’t enough. You have to share it and act on it. The need to share is what makes the EMC Chorus 2.0 announcement so interesting. EMC’s Unified Analytics Platform includes the Greenplum Database, Greenplum HD (Hadoop), and Chorus for collaboration … wait, collaboration? Yes, collaboration is critical. Remember, with big data you don’t quite know what you’re looking for. You need to share insights fast to find out whether they’re important. Chorus 2.0 lets you not only provision new stores as “sandboxes” that you can share. You can share insights about the data.
Once you share, you then need to act. But you didn’t know what you were looking for. So the integration doesn’t exist. A much more agile form of data integration is needed if you’re going to act on an insight in time to benefit from it. Remember, my wife and I move on to other shopping sites pretty quickly. That’s where data virtualization comes in. It gives analysts and IT the ability to quickly merge data from Greenplum with traditional applications on the fly, like e-mail addresses or phone numbers, and move the combined results into Salesforce or another CRM application. With data virtualization, these integrations can be done fast, within days.
Tools like Chorus, data virtualization, and increased self-service for integration in general are critical if you’re really going to leverage big data. I wish companies had gotten these tools last year. I can only hope that next year, if my wife hits an issue next year on a site, she’ll get that call.