People are obsessed with data. Data captured from our smartphones. Internet data showing how we shop and search — and what marketers do with that data. Big Data, which I loosely define as people throwing every conceivable data point into a giant Hadoop cluster with the hope of figuring out what it all means.
Too bad all that attention stems from fear, uncertainty and doubt about the data that defines us. I blame the technology industry, which — in the immortal words of “Cool Hand Luke” — has had a “failure to communicate.” For decades we’ve talked the language of IT and left it up to our direct customers to explain the proper care-and-feeding of data to their business users. Small wonder it’s way too hard for regular people to understand what we, as an industry, are doing. After all, how we can expect others to explain the do’s and don’ts of data management when we haven’t clearly explained it ourselves?
I say we need to start talking about the ABC’s of handling data in a way that’s easy for anyone to understand. I’m convinced we can because — if you think about it — everything you learned about data you learned in kindergarten: It has to be clean, safe and connected. Here’s what I mean:
Data cleanliness has always been important, but assumes real urgency with the move toward Big Data. I blame Hadoop, the underlying technology that makes Big Data possible. On the plus side, Hadoop gives companies a cost-effective way to store, process and analyze petabytes of nearly every imaginable data type. And that’s the problem as companies go through the enormous time suck of cataloging and organizing vast stores of data. Put bluntly, big data can be a swamp.
The question is, how to make it potable. This isn’t always easy, but it’s always, always necessary. It begins, naturally, by ensuring the data is accurate, de-deduped and complete.
Now comes the truly difficult part: Knowing where that data originated, where it’s been, how it’s related to other data and its lineage. That data provenance is absolutely vital in our hyper-connected world where one company’s data interacts with data from suppliers, partners, and customers. Someone else’s dirty data, regardless of origin, can ruin reputations and drive down sales faster than you can say “Target breach.” In fact, we now know that hackers entered Target’s point-of-sales terminals through a supplier’s project management and electronic billing system. We won’t know for a while the full extent of the damage. We do know the hack affected one-third of the entire U.S. population. Which brings us to:
Obviously, being safe means keeping data out of the hands of criminals. But it doesn’t stop there. That’s because today’s technologies make it oh so easy to misuse the data we have at our disposal. If we’re really determined to keep data safe, we have to think long and hard about responsibility and governance. We have to constantly question the data we use, and how we use it. Questions like:
- How much of our data should be accessible, and by whom?
- Do we really need to include personal information, like social security numbers or medical data, in our Hadoop clusters?
- When do we go the extra step of making that data anonymous?
And as I think about it, I realize that everything we learned in kindergarten boils down to down to the ethics of data: How, for example, do we know if we’re using data for good or for evil?
That question is especially relevant for marketers, who have a tendency to use data to scare people, for crass commercialism, or to violate our privacy just because technology makes it possible. Use data ethically, and we can help change the use.
In fact, I believe that the ethics of data is such an important topic that I’ve decided to make it the title of my new blog.
Stay tuned for more musings on The Ethics of Data.
As 2014 is already upon us, here are Marge’s 2014 predictions:
- CMOs will actually be more data-driven than the CIOs: In 2014, CMOs will take over the lead from IT as the organization which most effectively collects, cleanses and leverages data about customers from a wide variety of sources from data bases, to CRM systems, to digital tools to gain a full picture of the customer base.
- Convergence of CIOs and CMOs: As marketers’ technology spend increases, CMOs are gaining more power in the digital space and consequently need to work with the CIO more and more. In 2014 we will see the emergence of a new hybrid role where the CIO and CMO role will merge—the chief digital officer.
- Social media’s equal share: As social media sites come of age, i.e. Twitter’s IPO, the CMOs budget in 2014 will be equally distributed between brand, lead generation and social media. Social media becomes equally important to lead generation and even drives more lead generation through the funnel than traditional marketing tactics.
- Marketing automation: more dollars will be spent for programs versus people as marketers drive to create more automated processes in the coming year. 75 percent of marketing will be automated, while 25 percent will be customer unique.
- Custom content: As more and more marketers are creating their own content to drive sales, the barriers between paid, earned and owned media will break down to one integrated content strategy. Currently, 43 percent have a documented strategy—next year more than 60 percent will.
- Redefining ROI: As new platforms for marketing content arise, the definition of ROI will shift to ROE—“Return on Engagement” with customers, turning content into leads and sales; metrics will shift from quantitative to qualitative. The CMO will deliver a social ROE report weekly to the CEO.
- Internet of Things: According to Forrester, 90 percent of consumers who have multiple connected devices switch between the devices to complete tasks—that’s a lot of machine data about consumers and their products. CMOs will need to spend one-third of their time analyzing data and using predictive analytics to make marketing decisions.
- The quantified self: In 2014, mobile will drive more than 50% of the traffic to organizations homepage. Companies will need to be mobile-first. With data being pulled from numerous devices and platforms, one winner will emerge in BI in marketing to help collate this information.
- Micro-content: Content will continue to get shorter—even after the boom of the six-second Vine video. Next year, try creating a brand message via a three-second video or a Snapchat photo that lasts on your device no longer than 24 hours.
- Collaboration continues to rule the world: next year the emergence of an entirely new set of collaboration tools will burst on the scene that can be leveraged across country and across time zones to make collaboration easier and seamless.
The Potential of Information to inform every one, everything, everywhere is upon us.
For nearly forty years, the term IT has been associated with professional advances in technology and applications. Transactional applications. Systems infrastructure. Network Infrastructure. Applications infrastructure. The irony lies in the fact that all of these technologies have aimed at centralization of information for business outcomes, yet they’ve addressed only 2 percent of the information challenge that employees face today. Most business processes and even the average employee spend only a few minutes of the day in a transactional application. The other 98% of the day is spent in interactions — meetings, analyzing data and collaborating with colleagues. And barely a fraction of the information potential that can be unleashed for people and connected devices has been realized. This year, nearly 4 billion people will be connected on the Internet. And by the year 2020, tens of billions of consumer and industrial devices or smart machines will be online. The potential of information to inform every one, everything, everywhere is upon us.
Information has always been the Killer App. The fourth wave is Information Discovery, Visibility, and Optimization..
The first wave of application development focused on management of business processes. Visibility of transactions. And the target audience was management. The second wave of applications became modernization of these apps for heightened user convenience on the Web or mobile devices—smaller, lighter-weight, intuitive applications. Software as a service (SaaS) was the third wave of applications. It became about user-centricity and near-zero footprint deployment. The fourth wave of applications is about the combination of business, consumer, and device information. These applications will discover the right information, deliver it at the right time to the right person, process, or device. With intuition, elegance, and ease. And improve it over and over again. These applications will be increasingly distributed and lightweight— much like the Internet. Like Java. Like Ethernet. A true information network with an infinitely scalable architecture.
The Information Network. It’s Really That Big. And Needs to Be Really That Small.
The fastest-growing asset class in enterprises, governments, and individuals is information. Digital media. Sensor data. Relationships. Doubling every year, it’s growing faster than Twitter. Faster than cell phones. Faster than the birth, retirement, or death rate.
And it’s increasingly fragmented. Mobile, social, SaaS, and global value chains have accelerated the fragmentation. And the holy grail of consumer, employee, and citizen behavior lives beyond the scope of any single big data cluster—no matter how big. It will be real-time. And batch. Structured and unstructured. For humans and machines. Information will live in a three-tier architecture. Housed in the data center. Aggregated in the field or at the large device. Collected at the point of data. In the device. On the factory floor. In the hospital room.
The Information Network. Architected by Design. Powered by Vibe.
In order to realize this vision, information needs infrastructure and this infrastructure must be built on architecture. Only architecture can handle the range in scale, the diversity, and the accessibility needed for a true information network.
We believe that the foundational element of the information network is Vibe. Vibe is the world’s first embeddable virtual data machine for accessing, aggregregating and managing data regardless of source or format. The scalable virtual data machine that today powers data centers, cloud connections, and analytics around the world. And soon, Vibe will power small businesses, departments, applications, and devices everywhere.
Our Mission. Our Vision. Our Purpose.
We believe that architecture is the path to unleashing information potential. One device. Or every device. One business or a value chain. One nation or a global economy. And of course, one person, population, or movement. Our mission is to proliferate that architecture and deliver the industry’s most robust information platform.
Let’s Put Potential to Work. Together.
Hi everyone! Thanks for joining us for part three of three of this conversation. In this segment, Rick and I will talk about the Internet of Things. In the last part of our conversation we covered how quickly data is being generated. You can find Part 1 and Part 2 of this conversation on my Perspectives author page.
MB: If you think about the latest topic everyone has been talking about is the Internet of Things, and everything connected to the Internet. Let’s talk about what you think will happen from the machine side. In one example GE talks about their jet engines – a terabyte per day just from a single engine and the kind of the optimization and productivity that can come from that type of data control and insight if you will.
RS: There are a couple stories that relate to the Internet of Things. One that’s fascinating is a company in Boston called Ginger I/O that has come up with technology that can predict two days before you get depressed that you’re going to get depressed. When I first heard about this I was pretty skeptical. I met with the head of the company and he explained to me that each of us has a standard pattern of behavior related to travel and activity and two days before any of us show any outward signs of depression your smart phone can detect a change in your normal pattern.
For example determines that your normal radius of travel begins to shrink, the number of emails and tweets that you send goes down and the amount of time you spend at home goes up etc. He told us that people with diabetes have a high correlation of depression and when you get depressed you often have a high correlation of not taking your medicine. And the consequences of not taking your insulin if you have diabetes can be very severe. So people with diabetes are actually installing this program now on their own smartphones and they are setting up an alert that tells their doctor, their kids, their neighbor, their friends just to please check in on them.
Another story about two MIT computer scientists John Guttag and Collin Stultz who created a computer model to analyze formerly discarded EKG data of heart attack patients. By sifting through the massive quantities of data and identifying patterns that lead to greater heart attack risk, they’ve created a model that has the potential to significantly improve today’s risk-screening techniques, which misidentify roughly 70 percent of patients likely to have a repeat heart attack.
MB: Very interesting. So we use the term information potential to relate to all of the things that can happen after a bunch of data is gathered or sourced from somewhere to make it better, make it more ready to make great decisions, to get it to folks at the right time. So when you think about the potential of information in terms of the world what would be the one thing that you would bet on in achieving information potential?
RS: There’s so many examples, but there are a couple that I love because I think they’re unexpected. There is one company called ESRI that does very high resolution satellite mapping that government and cities use for understanding and visualizing cities. ESRI that there were villages in Nigeria that didn’t exist on any map, no one knew these people were there. The Nigerian government simply didn’t have any record that these people existed. The reason this was particularly important was that the Gates foundation was working with the Nigerian government to try to eradicate polio. Nigeria is one of the countries in the world that polio has made a major resurgence. By overlapping the satellite imagery with data coming from the 10,000 GPS enabled cell phones provided by Gates to the inoculation workers they are now able to map in real-time where these workers have been to make sure that every single family is inoculated. You wouldn’t think of using satellite data to eradicate polio in remote places in the developing world.
And look at the Google car where the vehicle is able to navigate at high speed utilizing existing data about the road and incorporating real time information including radar being bounced off the pavement so the car can “see” what’s happening three cars ahead.
MB: We’re really looking forward to seeing you in June for Informatica World. I think what you’ll see at the conference is another 2,000 people with 10,000 stories on how big data is going to change the world one small company or large company at a time. So, thanks so much for your time and we’re really looking forward to seeing you.
RS: Thank you, I can’t wait.
Thanks for your interest in this conversation between Rick and myself. We hope to see you next week in Las Vegas for Informatica World 2013!
Hi everyone, it’s great to be back again this week. I’m continuing my conversation with Rick Smolan about many aspects of big data. In our first part of Rick and I’s conversation, we touched on big data’s impact on healthcare. We last talked about DNA sequencing and the impact it could have on one’s quality of life – letting doctor’s know what the best course of treatment would be for each individual. You can find the first part of the conversation here. Today, Rick and I are chatting about the impact big data is having on the world and specifically in regards to political movements. Read on for the full discussion.
MB: That was a great example and one of the things I’ve been thinking is what’s the impact if you will of big data on the democratization or movements within the world- political movements that are based on social interaction and social media. Any thoughts or observations in these areas?
RS: So at the beginning of the Human Face of Big Data we open with a couple of very dramatic two page spreads of photographs to make a point of the magnitude of the sea change that we’re experiencing right now. The first quote in the book is from Google chairman Eric Schmidt. According to Eric all of the data created by humanity from the dawn of humanity until 2003 was 5 Exabytes. Even if you don’t know when Exabyte is it’s a lot of data – what really bring the point home is his comment that now, every two days, humanity is generating 5 exabytes of data. So anybody listening to that – a housewife, a college professor, a student or a retiree can grasp the sheer magnitude of how our world is changing because of all the data available.
Another eye opening quote in the book is that today in a major city like Tokyo, Paris, London, New York City the amount of information that we are exposed to in the course of a single day is equivalent to amount of information someone in the 15th century experienced in their entire life. Again straight vertical line.
Likewise during the first day of a baby’s life humanity now generates 70 times the amount of the information contained in the Library of Congress. Now we’re seeing the same thing happening in the world of politics where it used to be that the data and information and the ability to analyze what’s going on was really in the hands of large governments and corporations. We’re seeing that the transparency of open data amplified by social media – platforms like Twitter and Facebook is having an enormous effect on politics because the ability to get information and respond to it from people who are like-minded and gather and protest injustice or to question their politicians or to read initiatives that they feel should be addressed.
The fact is that the average person is walking around with a broadcast network in their pocket. The Democratization of media is also changing our society hopefully in really positive ways.
Tune in next week to hear Rick and I talk about the Internet of Things in part three of three of this conversation.
Marge Breya, Informatica’s CMO in conversation with Rick Smolan, Former Time, Life, and National Geographic photographer and the CEO of Against All Odds Productions
I’m Marge Breya the EVP and chief marketing officer at Informatica and I couldn’t be more delighted to have talked with Rick Smolan about big data and the implications and benefits it can have on the human race. Rick as many of you know is the creator of the human face of big data project and he is going to be one of the featured speakers at Informatica World 2013. Follow this three part series to read the entire transcript of our conversation.
MB: Hi Rick welcome to the podcast.
RS: Pleased to be here, thank you.
MB: We have a few minutes of your time and we thought we wouldd just focus on a couple key points that you may be covering and our audience might be interested in in terms of big data. Let’s start at the beginning. What brought your interest to big data?
RS: I’ve been working on projects for many years with groups of journalists and every 18 months I gather the tribe to do a deep dive on emerging topics such as the Internet. The first year it was touching people’s lives. Another project focused on the global water crisis. Another on the effect of the microprocessor in the course if a typical day. Last year I was at the TED conference and I ran into Marissa Mayer who was still at Google and is now CEO of Yahoo and she asked me what my next project was. I told her I was struggling with what interesting emerging topics we should focus on and she said you guys should look at big data.
I asked her to explain that phase and her analogy piqued my interest. She said big data was like watching the planet grow a nervous system. She explained that through our smart phones, our Google searches, credit card and ATM transactions, for the first time the industry the human race has the ability to collect that data analyze it and visualize it and respond to it while things are still happening, almost in real-time.
MB: As you began to get started and look into this I’m sure you realized it was more than what you originally thought, for example from everything that’s happening with our citizens to what’s happening in terms of sensors on cars, machines, etc. Any big observations that you came to?
RS: We had almost two hundred people working on the Human Face of Big Data. And as I talked to people so much of what I’m sharing reminded me of what I recall hearing in the early days of the Internet in 1993. People were talking about cyberspace and the World Wide Web. People like Nicholas Negroponte were waxing eloquent that the Internet was going to change every aspect of life on earth and other people were saying that it’s just a better way to look at pornography. In many respects Nicholas Negroponte was prophetic in terms of his envisioning of how this technology connecting people was never possible before was going to really alter life on earth.
When I started looking at big data, I was hearing people on the one side saying big data equals Big Brother, just another way to oppress people, to track them and sell them stuff they don’t want. And then on the other hand I spoke with Marissa who saw this as a great benefit to humanity.
Now that I’ve spent a year and a half working with this wonderful team of journalists, writers, photographers, illustrators and researchers I’m much more on the glass half empty side. I think there’s certainly lots of things that we have to be careful of like privacy in these oceans of data, but I think the up side offers so much to our species. I deeply believe that big data is going to have 1,000 times more impact than the Internet on our lives. You can see it happening in every aspect of human behavior. Right now we’re in the caveman era of big data and I don’t think it’s going to be called big data for much longer. I think it’s going to be segmented and fragmented and called something else. I think we’re starting to get glimpses of how it’s starting to change our way of responding to how we use our resources much more wisely. My kids are 10 and 13 and my mother just turned 90 and I want to know how all this is going to affect their lives.
MB: This is fascinating, so as you think about Informatica World, this year will have nearly 2,000 people at the conference all of which are committed to sorting out this world of big data and frankly we use the phrase ‘how to help people achieve or reach their information potential.’ It’s not just from a people standpoint it could be from a business decision or machine standpoint. I wonder what your thoughts are in terms of any advice that you have for professionals in this industry who are trying to harness this change and help other people achieve that potential.
RS: Well you can almost choose any area of human endeavor, but one that I think we found lots of great stories in that people respond to is in the healthcare arena. I was at the TedMed conference last year and Francis Collins, head of the National Institute for Health, said something that was quite remarkable. He told the story of how when Steve Jobs first got sick a few years ago it cost $100,000 to sequence his DNA to try to understand what he had and what might be the best course of treatment for him.Now today four or five years later the cost of sequencing someone’s DNA has dropped to about $3,000. Francis Collins believes that in another five years it might cost as little as $40 at Walgreens – like getting a flu shot. And before your doctor prescribes anything to you – even an aspirin – your DNA will have to be sequenced, its personalized medicine where the medicine has been tailored specifically for you. And the reason that this is so important is that large pharmaceutical companies for years have been spending hundreds of millions of dollars working on addressing major illnesses that people have. But often when they do clinical trials on humans and they find that even though 99% of the people taking the drug will be helped, 1% might be adversely affected or killed by it. And therefore they can never release the product.
Collins’ point was that if each of our individual DNAs were sequenced a doctor would know who would benefit from one of these drugs and who would be harmed. If you’ve ever known anyone who’s gone through cancer or any other illness its often a process of trial and error. The doctors try lots of different things, some of them incredibly unpleasant and in some cases making someone’s life miserable and then it doesn’t really help. Big data has the potential of ushering in an era of personalized medicine to help doctors go right to the correct treatment instead of all this trial and error. Again it’s the idea of using a bullet instead of a shotgun to treat the problem. And you see this in industry after industry, healthcare is obviously one that affects us all very deeply.
Make sure to check back next week to read part two of three of this transcript.