Several years ago, I got to participate in one of the first neural net conferences. At the time, I thought it was amazing just to be there. There were chip and software vendors galore. Many even claimed to be the next Intel or the next Microsoft. Years later I joined a complex event processing vendor. Again, I felt the same sense of excitement. In both cases, the market participants moved from large horizontal market plays to smaller and more vertical solutions.
Now to be clear, it is not my goal today to pop anyone’s big data balloon. But as I have gotten more excited about big data, I have gotten more and more an eerie sense of deja vu. The fact is the more that I dig into big data and hear customer’s stories about what they are trying to do with big data; the more I have concern about the similarities between big data and neural nets and complex event processing.
Clearly, big data does offer some interesting new features. And big data does take advantage of other market trends including virtualization and cloud. By doing so, big data achieves new orders of scalability than traditional business intelligence processing and storage. At the same time, big data offers the potential for lowering cost but I should take a moment to stress the word potential. The reason I do this is that while a myriad of processing approaches have been developed, no standard has yet emerged. And early adopters complain about having a difficulty in hiring big data map reduce programmers. And just like neural nets, the programing that needs to be done is turning out to be application specific.
With this said, it should be clear that big data does offer the potential to test datasets and to discover new and sometimes unexpected data relationship. This is a real positive thing. However, like its predecessors, this work is application specific and the data that is being related is truly of differing quality and detail. This means that the best that big data can do as a technology movement is discover potential data relationship. Once this is done, meaning can only be created by establishing detailed data relationships and dealing with the varying quality of data sets within the big data cluster.
Big Data will become small for management analysis
This means that big data must become small in order to really solve customer problems. Judith Hurwitz puts it this way, “big data analysis is really about small data. Small data, therefore, is the product of big data analysis. Big data will become small so that it is easier to comprehend”. What is “more necessary than ever is the capability to analyze the right data in a timely enough fashion to make decisions and take actions”. Judith says that in the end what is needed is quality data that is consistent, accurate, reliable, complete, timely, reasonable, and valid. The critical point is whether you use map reduce processing or traditional BI means, you shouldn’t throw out your data integration and quality tools. As big data becomes smaller, these will in reality become increasingly important.
So how does Judith see big data evolving? Judith sees big data propelling a lot of new small data. Judith believes that, “getting the right perspective on data quality can be very challenging in the world of big data. With a majority of big data sources, you need to assume that you are working with data that is not clean”. Judith says that we need to accept the fact that a lot of noise will exist in data. It is by searching and pattern matching that you will be able to find some sparks of truth in the midst of some very dirty data”. Judith suggests, therefore, a two phase approach—1) look for patterns in big data without concern for data quality; and 2) after you locate your patterns, applying the same data quality standards that have been applied to traditional data sources.
For this reason, I believe that history will to a degree repeat itself. Clearly, the big data emperor does have his clothes on, but big data will become smaller and more vertical. Big data will become about relationship discovery and small data will become about quality analysis of data sources. In sum, this means that small data analysis is focused and provides the data for business decision making and big data analysis is broad and is about discovering what data potentially relates to what data. I know this is a bit of different from the hype but it is realistic and makes sense. Remember, in the end, you will still need what business intelligence has refined.