What’s Next for Big Data Analytics?
Analytics Chalk Talks: What’s Next for Big Data Analytics?
Welcome back to our series of Analytics Chalk Talk videos. I hope you enjoyed our first video: “Data Warehouses: Past, Present, and Future”. This week, I want to discuss what’s next in big data analytics and how a robust data management architecture can take your through any change. Below is a transcript from the video as well as the video itself. Watch and please share your views. Find me on Twitter @leanlyle.
I often get asked, ‘What’s coming next in the area of big data analytics?’ I don’t really think that’s the right question. I think what we’ve learned, even in the last couple of years, is that what’s coming next is not what we should be looking at. We just need to realize that something is coming next, and it’s going to displace what is here now.
Think about it. We had Hadoop just five or seven years ago. We started working with it, we built our clusters, we started experimenting with MapReduce, and we got familiar with it. Then, last year, the CEO of one of the big data distribution vendors announces that it’s all about Spark. Well, wait a second. I just invested all this time in MapReduce, and now it’s Spark?
Okay, what’s it going to be next? The important thing is we need to assume that something’s coming next and to prepare for it. We have to constantly think as if something’s going to change.
Preparing for change
So, let’s look at how we can do that. First, we have to focus on the right thing. Data assets are growing and the world is becoming more complex with far greater heterogeneity and variety than before. With sensor data, machine data, social media data, cloud software—the world is disintegrating as far as data goes.
There’s no magic solution to that. You can’t just buy a big data product and hope it just magically joins all that data, cleans it all up and nicely solves all your problems. That’s not the way things work.
So what we have are all these beautiful, next-generation capabilities in the area of predictive analytics, machine learning, statistical analysis. All sorts of powerful, neat things that we wish we had a long time ago. That’s great. Underlying all this, we’ve got some very powerful big data processing and storage capabilities that are continually changing.
One of the things I see is that organizations seem to spend an awful lot of time analyzing these different distributions to see which one is best, which is going to work, or which one they should choose. But here’s the thing: that may not be the best way to spend your time. Pick one, and move on.
What we really need to focus on are these three pillars of data management. These are big data integration, big data governance, and big data security.
Big data integration
With big data, we need to be able to ingest and process data into our analytics environment as quickly as possible. We need to be able to handle reference data, master data, cross-reference data, transaction data, balance data—whatever it might be—in a way that uses common patterns, and learn from past logic that we’ve used to manage different types of data. Integration is key because as we add all these different complex heterogeneous data sources, being able to relate them together is critical.
Big data governance
Then we get to big data governance. Here, it’s not only about having an owner, data stewardship, and what we think about with traditional data governance. Really, in this big data world where we’re trying to move fast and provide new and really creative analytic capabilities to the business, we don’t want governance to get in the way. Instead we want governance to be able to allow the business to be able to work faster.
So, think about putting together a kind of, Amazon.com retail experience for data. Where the business person can search for what they’re looking for. Where they can find from the results either based on relevance, or best sellers, or whatever, that they can understand and trust the data, the lineage of where it came from, the definitions and semantics of what the data means, when it was last updated or defined.
All this information needs to be available. But you also need the ability to understand what other data might be related to this. What other people bought with this data, for instance. So, governance takes on an even broader meaning, so that the big data lake becomes something that folks can ‘fish’ from, in a useful and efficient way.
Big data security
Now, if they did this without any security on the system, well that’s obviously not a good thing. So, what we also need to have as part of our big data management environment are encryption, masking and obfuscation capabilities that need to be applied to data sets or pieces of data. We also need an ability to understand who’s been looking at what data, and whether or not they were allowed to, while providing the authorization and authentication to the data that we like.
Big data management
None of this is provided by the foundational big data systems that you’re used to hearing about. So, this needs to be something that we’re really concentrating on from a big data management perspective. So, what does this depend on?
Once again, we’re back to metadata. It’s all about metadata. Having this universal metadata catalogue is what gives us this transparency, this view of how the integration is done. Of how the governance of the data is managed. Of how the security and the policies are actually followed.
So, now let’s go back to the original question. What’s coming next in big data analytics? The answer to this is not what’s important. The important thing is that new stuff will be coming. So focusing on this big data management layer, and the underlying metadata, is really what you need to manage all that change.