Are data lakes a good thing?
This was the debate going back and forth at the recent Data Summit, held in New York. Interestingly, the roster of speakers – representing a range of industry experts – was sharply divided on the value of data lakes to enterprises. Some saw data lakes — central repositories of raw data that is simply collected, and structured and processed at a later time when needed by an application – as risky business, while others regarded them as the logical way to make the most of the big data tsunami.
In keynote panel discussion early in the conference, Miles Kehoe, search evangelist at Avalon Consulting, and Anne Buff, business solutions manager at the SAS Institute, expressed caution about data lakes. Buff, for one, said data lakes were great technology tools, but didn’t make sense for the business. “I argue vehemently against it,” she said. “Not because it isn’t valuable. From an analytics standpoint it’s a great playground or sandbox because it’s this utopia of putting our data in one place and make it naked, make it raw so we could do whatever we want with it,” Buff said. “But that’s the biggest risk you could imagine. Let’s put every piece of data we ever had in our company in one place, and tell everybody about it.”
The problem, Buff continued, was data security and privacy. The only insurance against this is if an organization has a “good data governance program where people respect data,” as well as certifying individuals. However, Buff continued, such best practices are not very common in enterprises. She referred to the notion of secure data lakes as a “utopic belief that we can get all data in one place.”
Kehoe agreed with Buff, comparing the idea of the data lake to the “x” drive that was part of earlier PC networks. He cautions that organizations may not have enough control over the content of the data being stored within a data lake. “You’re putting stuff there, and you don’t know what it is,” he said. “You may have things that expose you to sexual harassment lawsuits, for example. Can you imagine people copying their files, and shoving it up to a file share somewhere, so it’s publicly available, with no security.”
Some experts say the idea of having such an x drive, or “a big dumb disk,” is fine. “If that’s how it takes to get data there, by all means, put it on that dumb disk,” said David Mariani, CEO of AtScale.
Mariani joined a panel later that day, which I moderated, that also included Wendy Gradek, senior manager with EMC, and Andy Schroepfer, chief strategy officer at Hosting, both of whom expressed great support for the data lake concept.
Data lakes help address the greatest challenge for many enterprises today is disparate data sources, and the inertia it creates within enterprises, said Gradek. “I don’t know how many times I’ve been told the information we need is six months out, or it’s about a year out. That’s not going to work for the business — their goals are very much weekly driven, especially in sales, where if you don’t make your numbers, and you don’t have visibility into your data, you’re running blind.” The key to resolving this supporting disparate data sources in a single enterprise location, she continued. “We need it to be in a central repository in its original state, so when we have those questions we can go to it and apply the logic as close to query time as possible and get what we need quickly.”
Mariani agreed, noting how he “came to a realization that data movement is evil. Data is like water. It’s very expensive and difficult move once it lands someplace.” Today’s data volumes have “grown beyond our ability to pre-process it or to pre-structure it, to build structures to answer questions today.”
Schroepfer stated that it’s better to have data in one place, “as opposed to distributed data sitting on different peoples’ desktops, sitting in different peoples’ Excel spreadsheets. To me, that’s far worse than having a centralized store where you can lock it down and provide access. It’s as good and as clean as you want it to be.”
(Disclosure: the author is a contributor to Database Trends & Applications, published by Information Today, Inc., host of the Data Summit mentioned above.)
There are lots of really fascinating applications coming out the big data space as of late, and I recently came across one that really may be the coolest of the coolest. There’s a UK-based firm that is employing big data to help predict earthquakes.
Unfortunately, predicting earthquakes thus far has been almost impossible. Imagine if people living in an earthquake zone could get at least several hours’ notice, maybe even several days, just as those in the paths of hurricanes can advanced warning and can flee or prepare. Hurricane and storm modeling is one of the earliest examples of big data in action, going back decades. The big data revolution may now be on the verge of earthquake prediction modeling as well.
Bernard Marr, in a recent Forbes post, explains how Terra Seismic employs satellite data to sense impending shakers:
“The systems use data from US, European and Asian satellite services, as well as ground based instruments, to measure abnormalities in the atmosphere caused by the release of energy and the release of gases, which are often detectable well before the physical quake happens. Large volumes of satellite data are taken each day from regions where seismic activity is ongoing or seems imminent. Custom algorithms analyze the satellite images and sensor data to extrapolate risk, based on historical facts of which combinations of circumstances have previously led to dangerous quakes.”
So far, Marr reports, Terra Seismic has been able to predict major earthquakes anywhere in the world with 90% accuracy. Among them is a prediction, issued on February 22nd, that a 6.5-magnitude quake would hit the Indonesian island of Sumatra. The island was hit by a 6.4-magnitude quake on March 3rd.
There’s no question that the ability to accurately forecast earthquakes – at least as closely as hurricanes and major blizzards can be predicted – will not only save many human lives, but also be invaluable to government agencies and businesses as well.
At the same time, such creative – and potentially and game-changing – applications of big data provide very graphic examples of how data is converted to insights that were never possible before. Many business leaders are looking for ways to shine a light on potential events within their organizations and markets, and examples such as Terra Seismic accentuate the positive benefits big data can deliver.
Terra Seismic’s forecasts are available through a public website: http://quakehunters.com/
I recently had the opportunity to participate in the “CDO Summit,” hosted by the CDO Club and Capgemini Consulting. While the CDO in this case meant Chief Digital Officer, I noticed some of the speakers had “Data” in their titles, suggesting a close alignment with CDO as Chief Data Officer as well. In fact, the conference program was packed full of discussion and presentations on how data analytics was shifting the game for many enterprises.
Not too long ago, I asked a group of executives what the difference between chief data officer and chief digital officer was. Generally, chief data officers were seen as reporting to chief digital officers, as data was the key component of broader efforts to move to digital enterprise. The chief digital officer is assumed to have roles encompassing various aspects of content development, sales and marketing, operations, production, finance, and product development.
Then again, the chief data officer will also be immersed in these areas of the business as well.
The overlap and convergence between the two CDOs mirrors what’s happening in many organizations. Many recognize the opportunities now available through digital channels, and the efficiencies that can be gained by adding intelligence to products and services. At the same time, this only can be accomplished by capturing, analyzing and monetizing the data that is generated or supports these digital efforts.
This means converged responsibilities, skill demands and opportunities for a range of positions across enterprises – not just CDOs.
Nevertheless, these two types of executives are likely to be looking at things from different perspectives. For example, in terms of background, the chief data officer is likely to have a background in statistical analysis, and may come up through the ranks as a data scientist. Many chief digital officers are coming out of marketing or IT.
Thus, you are likely to find chief data officers worry more about the data, and how it is being created, handled, and secured, while chief digital officers focus on the bigger picture.
For some perspective on the roles of chief data officers, Dr. Anne Marie Smith, principal consultant at Alabama Yankee Systems, LLC, describes the scope of responsibilities in report out of the Cutter Consortium:
- Articulate the enterprise’s data vision
- Serve as “champion for global data management, governance, quality, and vendor relationships across the enterprise.”
- Work with “executives, data owners, and data stewards to achieve data accuracy and process requirement goals for all internal and external customers.”
- Oversee “the monitoring of data quality efforts within the organization.”
- Lead the education of the organization “on data management concepts, the appropriate usage of data, enterprise master data management and data quality concepts, enterprise decision-support concepts, data vendor capabilities, definition and appropriateness of data management, rules on data access, and other data-related issues.”
The responsibilities of chief digital officers don’t fall too far from those of chief data officers, as they also call for data leadership. The roles of this CDO as explained by Sam Ramji, Vice President of strategy at Apigee, include the following:
- Articulate the enterprise’s digital strategy – how a digital transformation will help the organization “meet the challenges of a mobile-first world, digital partnerships, and new forms of competition,” as well as “build a consistent experience for customers across different lines of business in order to produce network effects for the enterprise.”
- Earn company-wide commitment for the digital strategy – serving “as a culture broker, establishing a single vision that spans businesses and technologies and being the active champion who gets everyone on board to execute that vision.”
- Embrace data-based experimentation — facilitate the ability to experiment repeatedly in the digital realm, with the expectation that failure is the most important part of innovation.
- Drive for tangible and measurable results Connect with experts in the company and the broader industry.
Speak multiple business languages (IT, marketing, strategy, finance).
Last fall, at a large industry conference, I had the opportunity to conduct a series of discussions with industry leaders in a portable video studio set up in the middle of the conference floor. As part of our exercise, we had a visual artist do freeform storyboarding of the discussion on large swaths of five-foot by five-foot paper, which we then reviewed at the end of the session. For example, in a discussion of cloud computing, the artist drew a rendering of clouds, raining data on a landscape below, illustrated by sketches of office buildings. At a glance, one could get a good read of where the discussion went, and the points that were being made.
Data visualization is one of those up-and-coming areas that has just begin to breach the technology zone. There are some powerful front-end tools that help users to see, at a glance, trends and outliers through graphical representations – be they scattergrams, histograms or even 3D diagrams or something else eye-catching. The “Infographic” that has become so popular in recent years is an amalgamation of data visualization and storytelling. The bottom line is technology is making it possible to generate these representations almost instantly, enabling relatively quick understanding of what the data may be saying.
The power that data visualization is bringing organizations was recently explored by Benedict Carey in The New York Times, who discussed how data visualization is emerging as the natural solution to “big data overload.”
This is much more than a front-end technology fix, however. Rather, Carey cites a growing body of knowledge emphasizing the development of “perceptual learning,” in which people working with large data sets learn to “see” patterns and interesting variations in the information they are exploring. It’s almost a return of the “gut” feel for answers, but developed for the big data era.
As Carey explains it:
“Scientists working in a little-known branch of psychology called perceptual learning have shown that it is possible to fast-forward a person’s gut instincts both in physical fields, like flying an airplane, and more academic ones, like deciphering advanced chemical notation. The idea is to train specific visual skills, usually with computer-game-like modules that require split-second decisions. Over time, a person develops a ‘good eye’ for the material, and with it an ability to extract meaningful patterns instantaneously.”
Video games may be leading the way in this – Carey cites the work of Dr. Philip Kellman, who developed a video-game-like approach to training pilots to instantly “read” instrument panels as a whole, versus pondering every gauge and dial. He reportedly was able to enable pilots to absorb within one hour what normally took 1,000 hours of training. Such perceptual-learning based training is now employed in medical schools to help prospective doctors become familiar with complicated procedures.
There are interesting applications for business, bringing together a range of talent to help decision-makers better understand the information they are looking at. In Carey’s article, an artist was brought into a medical research center to help scientists look at data in many different ways – to get out of their comfort zones. For businesses, it means getting away from staring at bars and graphs on their screens and perhaps turning data upside down or inside-out to get a different picture.
For those hoping to push through a hard-hitting analytics effort that will serve as a beacon of light within an otherwise calcified organization, there’s probably a lot of work cut out for you. Evolving into an organization that fully grasps the power and opportunities of data analytics requires cultural change, and this is a challenge organizations have only begin to grasp.
“Sitting down with pizza and coffee could get you around can get around most of the technical challenges,” explained Sam Ransbotham, Ph.D, associate professor Boston College, at a recent panel webcast hosted by MIT Sloan Management Review, “but the cultural problems are much larger.”
That’s one of the key takeaways from a the panel, in which Ransbotham was joined by Tuck Rickards, head of digital transformation practice at Russell Reynolds Associates, a digital recruiting firm, and Denis Arnaud, senior data scientist Amadeus Travel Intelligence. The panel, which examined the impact of corporate culture on data analytics, was led by Michael Fitzgerald, contributing editor at MIT Sloan Management Review.
The path to becoming an analytics-driven company is a journey that requires transformation across most or all departments, the panelists agreed. “It’s fundamentally different to be a data-driven decision company than kind of a gut-feel decision-making company,” said Rickards. “Acquiring this capability to do things differently usually requires a massive culture shift.”
That’s because the cultural aspects of the organization – “the values, the behaviors, the decision making norms and the outcomes go hand in hand with data analytics,” said Ransbotham. “It doesn’t do any good to have a whole bunch of data processes if your company doesn’t have the culture to act on them and do something with them.” Rickards adds that bringing this all together requires an agile, open source mindset, with frequent, open communication across the organization.
So how does one go about building and promoting a culture that is conducive to getting the maximum benefit from data analytics? The most important piece is being about people who ate aware and skilled in analytics – both from within the enterprise and from outside, the panelists urged. Ransbotham points out that it may seem daunting, but it’s not. “This is not some gee-whizz thing,” he said. “We have to get rid of this mindset that these things are impossible. Everybody who has figured it out has figured it out somehow. We’re a lot more able to pick up on these things that we think — the technology is getting easier, it doesn’t require quite as much as it used to.”
The key to evolving corporate culture to becoming more analytics-driven is to identify or recruit enlightened and skilled individuals who can provide the vision and build a collaborative environment. “The most challenging part is looking for someone who can see the business more broadly, and can interface with the various business functions –ideally, someone who can manage change and transformation throughout the organization,” Rickards said.
Arnaud described how his organization – an online travel service — went about building an espirit de corps between data analytics staff and business staff to ensure the success of their company’s analytics efforts. “Every month all the teams would do a hands-on workshop, together in some place in Europe [Amadeus is headquartered in Madrid, Spain].” For example, a workshop may focus on a market analysis for a specific customer, and the participants would explore the entire end-to-end process for working with the customer, “from the data collection all the way through to data acquisition through data crunching and so on. The one knowing the data analysis techniques would explain them, and the one knowing the business would explain that, and so on.” As a result of these monthly workshops, business and analytics teams members have found it “much easier to collaborate,” he added.
Web-oriented companies such as Amadeus – or Amazon and eBay for that matter — may be paving the way with analytics-driven operations, but companies in most other industries are not at this stage yet, both Rickards and Ransbotham point out. The more advanced web companies have built “an end-to-end supply chain, wrapped around customer interaction,” said Rickards. “If you think of most traditional businesses, financial services or automotive or healthcare are a million miles away from that. It starts with having analytic capabilities, but it’s a real journey to take that capability across the company.”
The analytics-driven business of the near future – regardless of industry – will likely to be staffed with roles not seen as of yet today. “If you are looking to re-architect the business, you may be imagining roles that you don’t have in the company today,” said Rickards. Along with the need for chief analytics officers, data scientists, and data analysts, there will be many new roles created. “If you are on the analytics side of this, you can be in an analytics group or a marketing group, with more of a CRM or customer insights title. Yu can be in a planning or business functions. In a similar way on the technology side, there are people very focused on architecture and security.”
Ultimately, the demand will be for leaders and professionals who understand both the business and technology sides of the opportunity, Rickards continued. Ultimately, he added, “you can have good people building a platform, and you can have good data scientists. But you better have someone on the top of that organization knowing the business purpose.’
What does it take to be an analytics-driven business? That’s a question that requires a long answer. Recently, Gartner research director Lisa Kart took on this question, noting how the key to becoming an analytics-driven business.
So, the secret of becoming an analytics-driven business is to bust down the silos — easier than done, of course. The good news, as Kart tells it, is that one doesn’t need to be casting a wide net across the world in search of the right data for the right occasion. The biggest opportunities are with connecting the data you already have, she says.
Taking Kart’s differentiation of just-using-analytics versus analytics-driven culture a step further, hare is a brief rundown of how businesses just using analytics approach the challenge, versus their more enlightened counterparts:
Business just using analytics: Lots of data, but no one really understands how much is around, or what to do with it.
Analytics-driven business: The enterprise has a vision and strategy, supported from the top down, closely tied to the business strategy. Management also recognizes that existing data has great value to the business.
Business just using analytics: Every department does its own thing, with varying degrees of success.
Analytics-driven business: Makes connections between all the data – of all types — floating around the organization. For example, gets a cross-channel view of a customer by digging deeper and connecting the silos together to transform the data into something consumable.
Business just using analytics: Some people in marketing have been collecting customer data and making recommendations to their managers.
Analytics-driven business: Marketing departments, through analytics, engage and interact with customers, Kart says. An example would be creating high end, in-store customer experiences that gave customers greater intimacy and interaction.
Business just using analytics: The CFO’s staff crunches numbers within their BI tools and arrive at what-if scenarios.
Analytics-driven business: Operations and finance departments share online data to improve performance using analytics. For example, a company may tap into a variety of data, including satellite images, weather patterns, and other factors that may shape business conditions, Kart says.
Business just using analytics: Some quants in the organization pour over the data and crank out reports.
Analytics-driven business: Encourages maximum opportunities for innovation by putting analytics in the hands of all employees. Analytics-driven businesses recognize that more innovation comes from front-line decision-makers than the executive suite.
Business just using analytics: Decision makers put in report requests to IT for analysis.
Analytics-driven business: Decision makers can go to an online interface that enables them to build and display reports with a click (or two).
Business just using analytics: Analytics spits out standard bar charts, perhaps a scattergram.
Analytics-driven business: Decision makers can quickly visualize insights through 3D graphics, also reflecting real-time shifts.
Despite spending more than $30 Billion in annual spending on Big Data, successful big data implementations elude most organizations. That’s the sobering assessment of a recent study of 226 senior executives from Capgemini, which found that only 13 percent feel they have truly have made any headway with their big data efforts.
The reasons for Big Data’s lackluster performance include the following:
- Data is in silos or legacy systems, scattered across the enterprise
- No convincing business case
- Ineffective alignment of Big Data and analytics teams across the organization
- Most data locked up in petrified, difficult to access legacy systems
- Lack of Big Data and analytics skills
Actually, there is nothing new about any of these issues – in fact, the perceived issues with Big Data initiatives so far map closely with the failed expect many other technology-driven initiatives. First, there’s the hype that tends to get way ahead of any actual well-functioning case studies. Second, there’s the notion that managers can simply take a solution of impressive magnitude and drop it on top of their organizations, expecting overnight delivery of profits and enhanced competitiveness.
Technology, and Big Data itself, is but a tool that supports the vision, well-designed plans and hard work of forward-looking organizations. Those managers seeking transformative effects need to look deep inside their organizations, at how deeply innovation is allowed to flourish, and in turn, how their employees are allowed to flourish. Think about it: if line employees suddenly have access to alternative ways of doing things, would they be allowed to run with it? If someone discovers through Big Data that customers are using a product differently than intended, do they have the latitude to promote that new use? Or do they have to go through chains of approval?
Big Data may be what everybody is after, but Big Culture is the ultimate key to success.
For its part, Capgemini provides some high-level recommendations for better baking in transformative values as part of Big Data initiatives, based on their observations of best-in-class enterprises:
The vision thing: “It all starts with vision,” says Capgemini’s Ron Tolido. “If the company executive leadership does not actively, demonstrably embrace the power of technology and data as the driver of change and future performance, nothing digitally convincing will happen. We have not even found one single exception to this rule. The CIO may live and breathe Big Data and there may even be a separate Chief Data Officer appointed – expect more of these soon – if they fail to commit their board of executives to data as the engine of success, there will be a dark void beyond the proof of concept.”
Establish a well-defined organizational structure: “Big Data initiatives are rarely, if ever, division-centric,” the Capgemini report states. “They often cut across various departments in an organization. Organizations that have clear organizational structures for managing rollout can minimize the problems of having to engage multiple stakeholders.”
Adopt a systematic implementation approach: Surprisingly, even the largest and most sophisticated organizations that do everything on process don’t necessarily approach Big Data this way, the report states. “Intuitively, it would seem that a systematic and structured approach should be the way to go in large-scale implementations. However, our survey shows that this philosophy and approach are rare. Seventy-four percent of organizations did not have well-defined criteria to identify, qualify and select Big Data use-cases. Sixty-seven percent of companies did not have clearly defined KPIs to assess initiatives. The lack of a systematic approach affects success rates.”
Adopt a “venture capitalist” approach to securing buy-in and funding: “The returns from investments in emerging digital technologies such as Big Data are often highly speculative, given the lack of historical benchmarks,” the Capgemini report points out. “Consequently, in many organizations, Big Data initiatives get stuck due to the lack of a clear and attributable business case.” To address this challenge, the report urges that Big Data leaders manage investments “by using a similar approach to venture capitalists. This involves making multiple small investments in a variety of proofs of concept, allowing rapid iteration, and then identifying PoCs that have potential and discarding those that do not.”
Leverage multiple channels to secure skills and capabilities: “The Big Data talent gap is something that organizations are increasingly coming face-to-face with. Closing this gap is a larger societal challenge. However, smart organizations realize that they need to adopt a multi-pronged strategy. They not only invest more on hiring and training, but also explore unconventional channels to source talent. Capgemini advises reaching out to partner organizations for the skills needed to develop Big Data initiatives. These can be employee exchanges, or “setting up innovation labs in high-tech hubs such as Silicon Valley.” Startups may also be another source of Big Data talent.
By now, the business benefits of effectively leveraging big data have become well known. Enhanced analytical capabilities, greater understanding of customers, and ability to predict trends before they happen are just some of the advantages. But big data doesn’t just appear and present itself. It needs to be made tangible to the business. All too often, executives are intimidated by the concept of big data, thinking the only way to work with it is to have an advanced degree in statistics.
There are ways to make big data more than an abstract concept that can only be loved by data scientists. Four of these ways were recently covered in a report by David Stodder, director of business intelligence research for TDWI, as part of TDWI’s special report on What Works in Big Data.
The time is ripe for experimentation with real-time, interactive analytics technologies, Stodder says. The next major step in the movement toward big data is enabling real-time or near-real-time delivery of information. Real-time data has been a challenge with BI data for years, with limited success, Stodder says. The good news is that Hadoop framework, originally built for batch processing, now includes interactive querying and streaming applications, he reports. This opens the way for real-time processing of big data.
Design for self-service
Interest in self-service access to analytical data continues to grow. “Increasing users’ self-reliance and reducing their dependence on IT are broadly shared goals,” Stodder says. “Nontechnical users—those not well versed in writing queries or navigating data schemas—are requesting to do more on their own.” There is an impressive array of self-service tools and platforms now appearing on the market. “Many tools automate steps for underlying data access and integration, enabling users to do more source selection and transformation on their own, including for data from Hadoop files,” he says. “In addition, new tools are hitting the market that put greater emphasis on exploratory analytics over traditional BI reporting; these are aimed at the needs of users who want to access raw big data files, perform ad-hoc requests routinely, and invoke transformations after data extraction and loading (that is, ELT) rather than before.”
Nothing gets a point across faster than having data points visually displayed – decision-makers can draw inferences within seconds. “Data visualization has been an important component of BI and analytics for a long time, but it takes on added significance in the era of big data,” Stodder says. “As expressions of meaning, visualizations are becoming a critical way for users to collaborate on data; users can share visualizations linked to text annotations as well as other types of content, such as pictures, audio files, and maps to put together comprehensive, shared views.”
Unify views of data
Users are working with many different data types these days, and are looking to bring this information into a single view – “rather than having to move from one interface to another to view data in disparate silos,” says Stodder. Unstructured data – graphics and video files – can also provide a fuller context to reports, he adds.
The interesting thing is that many of the upstarts do not even intend to take on the market leader in the segment. Christensen cites the classic example of Digital Equipment Corporation in the 1980s, which was unable to make the transition from large, expensive enterprise systems to smaller, PC-based equipment. The PC upstarts in this case did not take on Digital directly – rather they addressed unmet needs in another part of the market.
Christensen wrote and published The Innovator’s Dilemma more than 17 years ago, but his message keeps reverberating across the business world. Lately, Jill Lapore questioned some of thinking that has evolved around disruptive innovation in a recent New Yorker article. “Disruptive innovation is a theory about why businesses fail. It’s not more than that. It doesn’t explain change. It’s not a law of nature,” she writes. Christensen responded with a rebuttal to Lapore’s thesis, noting that “disruption doesn’t happen overnight,” and that “[Disruptive innovation] is not a theory about survivability.”
There is something Lapore points out that both she and Christensen can agree on: “disruption” is being oversold and misinterpreted on a wide scale these days. Every new product that rolls out is now branded as “disruptive.” As stated above, the true essence of disruption is creating new markets where the leaders would not tread.
Data itself can potentially be a source of disruption, as data analytics and information emerge as strategic business assets. While the ability to provide data analysis at real-time speeds, or make new insights possible isn’t disruption in the Christensen sense, we are seeing the rise of new business models built around data and information that could bring new leaders to the forefront. Data analytics can either play a role in supporting this movement, or data itself may be the new product or service disrupting existing markets.
We’ve already been seeing this disruption taking place within the publishing industry, for example – companies or sites providing real-time or near real-time services such as financial updates, weather forecasts and classified advertising have displaced traditional newspapers and other media as information sources.
Employing data analytics as a tool for insights never before available within an industry sector also may be part of disruptive innovation. Tesla Motors, for example, is disruptive to the automotive industry because it manufactures entirely electric cars. But the formula to its success is its employment of massive amounts of data from its array of vehicle in-devices to assure quality and efficiency.
Likewise, data-driven disruption may be occurring in places that may have been difficult to innovate. For example, it’s long been speculated that some of the digital giants, particularly Google, are poised to enter the long-staid insurance industry. If this were to happen, Google would not enter as a typical insurance company with a new web-based spin. Rather, the company would be employing new techniques of data gathering, insight and analysis to offer an entirely new model to consumers – one based on data. As Christopher Hernaes recently related in TechCrunch, Google’s ability to collect and mine data on homes, business and autos give it a unique value proposition n the industry’s value chain.
We’re in an era in which Christensen’s mode of disruptive innovation has become a way of life. Increasingly, it appears that enterprises that are adept and recognizing and acting upon the strategic potential of data may be joining the ranks of the disruptors.
“What really matters about big data is what it does. Aside from how we define big data as a technological phenomenon, the wide variety of potential uses for big data analytics raises crucial questions about whether our legal, ethical, and social norms are sufficient to protect privacy and other values in a big data world.”
These crucial questions, raised in a recent White House report on the implications of big data, frame a growing debate taking place across both society and the business world on how far organizations can push the limits with data collection and analysis. The report, issued by a presidential commission tasked with assessing big data’s privacy implications, explains how big data is a double-edged sword. While big data analytics pave the way to unexpected discoveries, innovations, and advancements in our quality of life, it also has the potential for abuse as well. As the report puts it, big data’s capabilities, “most of which are not visible or available to the average consumer, also create an asymmetry of power between those who hold the data and those who intentionally or inadvertently supply it.”
The report’s authors acknowledge that big data analytics is an engine of economic growth and a competitive tool for companies across all industries, as well as a tool for quality of life. “Used well, big data analysis can boost economic productivity, drive improved consumer and government services, thwart terrorists, and save lives,” the report states. In addition, there will likely be a profound impact as data analytics gets applied to the Internet of Things, which “have made it possible to merge the industrial and information economies.” In another example, healthcare providers and payers can employ predictive analytics to detect fraud and abuse in real time.
The report’s main thrust is personal privacy implications, and many these issues will inevitably shape the practices and policies of enterprises as they expand their businesses into the big data realm. The managers and professionals charged with identifying, collecting and analyzing information assets will increasingly be under pressure – as their organizations feel pressure – to understand the boundaries between insight, targeted engagement, and overreach.
For example, a still relatively unexplored area of big data is its ownership. Does data belong to those who collect it, or those who contribute to it? “Big data may be viewed as property, as a public resource, or as an expression of individual identity,” the report states.
Another challenge is the fact that many organizations will opt to assemble massive databases as they move forward with big data analysis. “Big data technologies can derive value from large data sets in ways that were previously impossible — indeed, big data can generate insights that researchers didn’t even think to seek.” For example, new tools and technologies provide for analysis across entire data sets, versus extracting a small representative subset of the data and extrapolating any results against a larger universe. However, with so much data, analysis may potentially be erroneous as well. “Correlation still doesn’t equal causation,” the report’s authors state. “Finding a correlation with big data techniques may not be an appropriate basis for predicting out-comes or behavior, or rendering judgments on individuals. In big data, as with all data, interpretation is always important.”
Another issue is the permanence of data – which also is a privacy issue. At the same time, this may also create headaches for corporate data managers as well. “In the past, retaining physical control over one’s personal information was often sufficient to ensure privacy,” the report states. “Documents could be destroyed, conversations forgotten, and records expunged. But in the digital world, information can be captured, copied, shared, and transferred at high fidelity and retained indefinitely. Volumes of data that were once unthinkably expensive to preserve are now easy and affordable to store on a chip the size of a grain of rice. As a consequence, data, once created, is in many cases effectively permanent. Furthermore, digital data often concerns multiple people, making personal control impractical.”
The report’s authors state that organizations need to take steps to address privacy issues, and suggest de-identification and encryption as technical solutions that are available at this time. However, in the long run, de-identification is still a weak approach to the problem. “Many technologists are of the view that de-identification of data as a means of protecting individual privacy is, at best, a limited proposition. In practice, data collected and de-identified is protected in this form by companies’ commitments to not re-identify the data and by security measures put in place to ensure those protections.”
Ultimately, the best methods to ensure the ethical use of data need to come through inspired and forward-thinking management. It takes judicious management, a commitment to training and education, and a focus on what nuggets of information matter the most to the business. Big data opens up many new vistas for enterprises, and those that take the high road will reap its rewards.