Is Big Data Different because of its Audience?
I do not know about you, but every once while, two authors ideas snap together almost like they were created by one person. And more importantly, sometimes these authors’ ideas together explain something really important. This happened recently—at least for me–on the topic of big data.
Gartner, in the “The Integrators Dilemma: Can a Bimodal Approach Balance Integration Agility and Control?” (Gartner Foundational November 4, 2015), discusses the notion of there being two modes of data integration — the traditional systematic — high control, “one size fits all” — approach to integration and The emerging adaptive — high speed, “just do it” — approach to integration. This splitting of the data integration market is enabled when you consider how Derek Straus, the CDO for TD Ameritrade, considers the personas for data integration and business intelligence.
Mode 1: Traditional
These folks are largely concerned with continuous process improvement to well-founded business processes. Derek Strauss says that the buyer personas for this mode have well defined needs.
- Farmers: Are predictable and they know exactly what they want
- Tourists: They know where to find things
- Operators: They live in the body of the business. For example, they live in the call center. They need some repeatable facts with a fast response
According to Derek, the data warehouse responded to the needs of these personas well because it was about perfecting the data going into it. This approach to information management demanded data quality and data improvement. Derek says in the data warehouse, we put the data elements together and dealt with bad data. We fixed, says Derek, bad data at source, by fixing the program, by brokering process change, etc. Put simply these personas demanded what the data warehouse provided and had an a priori knowledge of the data they needed and how to use it.
Mode 2: Emerging
The audiences for traditional data warehousing and big data is fundamentally different. The personas here in contrast to mode 1 do not have a clear set of ends in mind for data. For this reason, they want self-service. And even more interesting, they do not have specific ends in mind in advance for the data. They are about learning and experimentation.
Derek Strauss describes two personas as fitting into Mode 2.
- Explorers: They say give me the raw data and I will figure out if there is anything of value in it?
- Miners: They need data in depth. They need every piece of data in order to figure out how to run things better
According to Derek, data warehousing was great for solving the needs of farmers, tourists, and operators, but it did not address the needs of explorers or miners. In fact from personal experience, solving these latter two personas requirements would have broken the traditional data warehouse. Derek says that big data for the first time addresses the needs of explorers and miners. This is why these two populations are so excited. For explorers and miners, it solves effectively world hunger. These folks as we have said believe that data is useful in its raw state. In the case of miners, this is because they are looking for patterns and signals. They want to tell a story with data.
Merging Mode 1 and Mode 2 Thinking
The problem is that neither mode is a silo. Typically, raw data needs to be infused with context—this means it needs what is already locked away in the data warehouse. Think about wanting to do predictive offers for customers and as a result needing to merge customer purchase data with social data. Additionally, while miners and explorers may claim that raw data is fine, every time they discover a new pattern in the data, that data needs to be operationalized so other personas can use it too—especially because many mode 1 users can be found in the executive ranks. This means making the miner and explorer data trustworthy for business decision making. Otherwise, miners and explorers will see limited value from all their work. Derek says “as I and Inman defined, we need to do the same thing to unstructured and semi-structured data”.
In conclusion, mode 1 and mode 2 have unique and different stakeholders. Each is trying to accomplish different things. What matters is that together both modes of users can solve significant business problems. It is for this reason that I believe that over time the worlds of business intelligence and big data will become closer and closer. And some point, they will be indistinguishable. At this point, one will start where the other one ends.