Category Archives: Big Data
Great data drives profits
We live in an age where technology is intertwined with our personal and business lives. Our machines, devices, and activities on social networks and the web are generating an explosion of data. It is this trove of big data that most executives in commercial or public enterprises believe hold the keys to untold insights. In fact, a study from Economist Intelligence Unit, reports that some 97% of executives believe that unlocking the value from this data is a strategic priority.
This is confirmed strongly in the CIO’s annual survey in 2014 with 72% of respondents stating “data use” as their top priority. To fully grasp the importance of this data, you need to consider what this data potentially represents for the business. And that requires a little journey back in our computing history.
Back-office and front-office automation
The first generations of software were focused primarily on automating back-office functions such as finance, manufacturing, and procurement. Data was pretty much reflected an event or, more accurately, a transaction.
The same was generally the case for the next wave of front-office applications, such as CRM. Data remained a by-product and was analyzed from time to time to provide trends and insights to optimize business processes. The value though was as it helped organizations to achieve the most profitable outcomes. In this age of productivity, we used it to analyze where we sold best, what products we sold most of, what was the most profitable region, division, region, etc.
Enter the age of engagement
The last decade has given rise to new technologies and applications that operated in a different fashion. Social platforms like Facebook or mobile phone data no longer represent transactions or events only, but reflect actual records and behaviors. It shows what you like or dislike – where you are – who are the members of your household or circle of friends. Most people refer to this as data.
This engagement data has potentially massive value when you can combine it with the traditional data landscape. If you are interacting with a company (e.g., shopping on a web site), it can feed you relevant pieces of information at the right time, shaping the outcome in a direction they desire. For example, if you shop on Amazon, you likely start with a regular search. But then things change compared to a normal Google search. Suddenly you get recommendations, reviews, alternative products, add-on products, special offers. And every time you click on a link, ensuing recommendations become more personalized and targeted, with new suggestions and offers designed to lead you toward a desired outcome – in their case a purchase.
At the heart of this experience is data — engagement data has become the fuel to deliver the most engaging experience and nudge the user towards a desired outcome. Of course this is not limited to retail, but is applied in every industry.
Companies are racing to unlock the value from the mass of data they collect in order to build new data-centric products and start fueling their own customer engagements. And to do so faster and better than the competition. But the “Data Directive” study makes a sobering statement in this regard: Only 12% of the executives thought they were highly effective at this and 15% thought they were better than the competition.
The imperative to design for great data
The reason many companies fall short on their strategic intent with data is locked up in our historical approach to data. Data was never the centerpiece of our thinking. The applications were. The relative few applications representing our back-office and even front-office applications looked pretty similar with similar data structures and concepts. But to fuel these types of engagements when they happen requires different thinking and potentially different technologies. The most fundamental aspect is how good the data needs to be to be successful.
Let’s consider two examples:
- The Google self-driving car has the same functional systems as a normal car to turn, brake, and accelerate. However, the thing that makes the care drive and steer in a (hopefully) proper manner is data — engagement data — data from the navigation system, sensors, and cameras. How accurate does this data need to be though?
- And revisiting our Amazon example, what are the chances of you selecting the recommended product if you already purchased that product last week when you were on the site?
There is an imperative for great data: To be able to design our environment in such a way that we can deliver great data continuously to any place or person that needs it. Great data is simply data that is clean, safe, and connected.
- Clean: Data that is accurate, de-duplicated, timely, and complete.
- Safe: Data that is treated with the appropriate level of security and confidentiality based on the policies regulating your business or industry.
- Connected: Data that links all the silos to reflect the whole truth consistently.
The challenge for most companies is that they can deliver clean, safe, and connected data for a single project at a time through the heroic efforts of scare IT resources. And this process is restarted every time a new request is received.
More often than not, each new project is executed by different teams, using different technologies resulting in poor ROI and slow time–to-value. If we are going to broadly unlock the value in our data to drive the next wave of profits, it is time to think of systematic ways to deliver great data to every process, person, or application. Or risk becoming one of the 85% of companies that lag their competitors in the ability to unlock the strategic value of data.
Some parting thoughts
Consider the following three practical themes as you develop your own great data strategy:
- Designing a system of integration that can continuously deliver great data requires you standardize across all your uses cases, all your projects, and all data types — whether on-premise or in the cloud or both.
- Establishing end-to-end metadata that is shared across your entire data system. If our data tells us about our business, metadata tells us about our data system.
- Designing for self-service and business-IT collaboration. Data is the responsibility of everyone in the company – not just IT.
For more on the imperative of putting data at the center of your organization’s future efforts, read the ebook, Data Drives Profit.
It’s great to hear that you’re feeling pretty hip at your ability to explain data, metadata and Big Data to your friends. I did get your panicked voicemail asking about the Internet of Things (IoT). Yes, you’re right that the Internet itself is a thing, so it’s confusing to know if the Internet is one of the things that the IoT includes. So let’s make it a bit simper so you can feel more comfortable about the topic at your bridge game next week. (Still shocked you’re talking about data with your friends– Dad complains you only talk with him about “Dancing with the Stars”).
First let’s describe the Internet itself. You use it everyday when you do a Google search: it’s a publicly accessible network of computer systems from around the world. Someday the Internet will hopefully allow for sharing of all human knowledge with one another.
But what about knowledge that’s not “human”? It’s not only people that create information. The “things” in IoT are the devices and machines that people, companies and governments rely upon every day. Believe it or not, there are billions of devices today that are also “connected” – meaning they have the ability to send information to other computers. In fact, the technology research firm Gartner says there will be 4.9 billion connected devices, or “things” in 2015. And by 2020, that number will reach 25 billion!
Some of these connected devices are more obvious than others – and some have been around a very long time. For example, you use ATMs all the time at the bank, when you’re withdrawing and depositing money. Clearly those machines can only access your account information if they’re connected to the banks computer systems that hold your account information.
Your iPhone, your “smart” thermostat, and your washing machine with the call home feature are all connected “Things” too. Mom, imagine waking up in the morning and having your coffee brewed already. The coffee machine brewed because it knew from your alarm clock what time you were waking up.
IoT will also help make sure your oven is off; your lost keys can be easily found and your fridge can check how many eggs are left while you’re standing in the grocery store. The possibilities are limitless.
And it doesn’t stop there. Medical devices are communicating with doctors, jet engines are communicating with their manufacturers, and parking meters are communicating with city services.
And guess what? There are people watching the computers that collect all of that information, and they’re working hard to figure out what value they can deliver by using it effectively. It’s actually this machine data that’s going to make Big Data REALLY Big.
So does that mean your espresso maker, your cell phone and your car will be conspiring to take over the house? Probably not something we need to worry about this year (Maybe you want to keep an eye on the refrigerator just in case). But in the short term, it will mean people like me who have dedicated our careers to data management will have our work cut out for ourselves trying to figure out how to make sense of all of this new machine interaction data from devices. Then how to marry it with the people interaction data from social networks like Facebook, LinkedIn and Twitter. And then how to marry all of that with all of the transactional data that companies capture during the normal course of business. As I’ve said before, data’s a good business to be in!
Happy Mother’s Day!
There are lots of really fascinating applications coming out the big data space as of late, and I recently came across one that really may be the coolest of the coolest. There’s a UK-based firm that is employing big data to help predict earthquakes.
Unfortunately, predicting earthquakes thus far has been almost impossible. Imagine if people living in an earthquake zone could get at least several hours’ notice, maybe even several days, just as those in the paths of hurricanes can advanced warning and can flee or prepare. Hurricane and storm modeling is one of the earliest examples of big data in action, going back decades. The big data revolution may now be on the verge of earthquake prediction modeling as well.
Bernard Marr, in a recent Forbes post, explains how Terra Seismic employs satellite data to sense impending shakers:
“The systems use data from US, European and Asian satellite services, as well as ground based instruments, to measure abnormalities in the atmosphere caused by the release of energy and the release of gases, which are often detectable well before the physical quake happens. Large volumes of satellite data are taken each day from regions where seismic activity is ongoing or seems imminent. Custom algorithms analyze the satellite images and sensor data to extrapolate risk, based on historical facts of which combinations of circumstances have previously led to dangerous quakes.”
So far, Marr reports, Terra Seismic has been able to predict major earthquakes anywhere in the world with 90% accuracy. Among them is a prediction, issued on February 22nd, that a 6.5-magnitude quake would hit the Indonesian island of Sumatra. The island was hit by a 6.4-magnitude quake on March 3rd.
There’s no question that the ability to accurately forecast earthquakes – at least as closely as hurricanes and major blizzards can be predicted – will not only save many human lives, but also be invaluable to government agencies and businesses as well.
At the same time, such creative – and potentially and game-changing – applications of big data provide very graphic examples of how data is converted to insights that were never possible before. Many business leaders are looking for ways to shine a light on potential events within their organizations and markets, and examples such as Terra Seismic accentuate the positive benefits big data can deliver.
Terra Seismic’s forecasts are available through a public website: http://quakehunters.com/
Partners play an instrumental role in Informatica’s business and have for many years. But there are some years when unique partnerships truly blossom and both companies come together to do really special things together that could not have been conceived of by themselves. And that is the case in 2015 with our partnership with Cisco.
On April 28, Informatica was awarded ISV Partner of the Year for the America’s by Cisco in Montreal, Canada.
This year the Cisco award was given for two solutions that were jointly created. The first solution is an end-to-end Data Warehouse Optimization (DWO) solution (ADD LINK). By combining Cisco UCS (Unified Computing System) along with Hadoop, Informatica Big Data Edition (BDE) and Cisco Data virtualization a customer gains access to a powerful next generation big data analytics platform for both structured and unstructured data.
This solution was created in order to help customers reduce both CAPEX and OPEX IT expenditures with regards to their Enterprise Data Warehouse increasing costs. By offloading infrequently used or dark data along with ETL (extract, transform and load) jobs and mappings into the Data Lake / Hadoop a customer can recognize a 5-10X cost reduction.
The second solution that was recognized by Cisco was a jointly created Internet of Things (IoT) / Internet of Everything (IoE) offering (ADD LINK). With the explosion of sensor, social and internet-based data the two companies recognized the need to create a solution that would incorporate data from “the edge” into a customer’s mainstream data repository (EDW, BD, Hadoop, etc.).
This solution includes Cisco routers and hardened devices to collect sensor data (i.e. Telemetry Data) coupled with Informatica’s real-time data ingestion and analytics capabilities. By combining these technologies a customer is able to aggregate data from every source where they capture data allowing them to gain a 360 view of their business for competitive advantage.
From our announcement in February, “The Data Warehouse Optimization solution is about enabling organizations to more easily leverage all their data assets -current and historical, transaction and interaction – for more effective analytics while reducing their data management costs,” said Mike Flannagan, vice president and general manager of Data and Analytics at Cisco. “More than the sum of its parts, the solution’s Cisco and Informatica elements work synergistically to meet the demands of big data, to respond quickly to changing information needs, and to deliver insights that drive increased competitiveness sand business innovation.”
Every organization on the planet is working hard to gather, analyze and make decisions based on their data. The joint Informatica and Cisco solutions are critical to helping customers to become Data Ready enterprises today and for years to come.
Moving forward, Cisco and Informatica will continue to collaborate on how to best build and deliver on premise, cloud-based and hybrid solutions so that customers have best-of-breed solutions to solve their exploding data and complex IT problems.
According to Gartner, 64% of organizations surveyed have purchased or were planning to invest in Big Data systems. More and more companies are diving into their data, trying to put it to use to minimize customer churn, analyze financial risk, and improve the customer experience.
Of that 64%, 30% have already invested in Big Data technology, 19% plan to invest within the next year, and another 15% plan to invest within two years. Less than 8% of Gartner’s 720 respondents, however, have actually deployed Big Data technology. This is bad, because most companies simply don’t know what they’re doing when it comes to Big Data.
Over the years, we have heard that Big Data is Volume, Velocity, and Variety. I feel this is one of the reasons why despite the Big Data hype, most companies are still stuck in neutral is because of this limited view.
- Volume: Terabytes to Exabytes, petabytes to Zetabytes of lots of data
- Velocity: Streaming data, milliseconds to seconds, how fast data is produced, and how fast the data must be processed to meet the need or demand
- Variety: Structured, unstructured, text, multimedia, video, audio, sensor data, meter data, html, text, e-mails, etc.
For us, the focus is on collection of data. After all, we are prone to be hoarders. Wired by our survival extinct to collect and hoard for the leaner winter months that may come. So while we hoard data, as much as we can, for the illusive “What if?” scenario. “Maybe this will be useful someday.” It’s this stockpiling of Big Data without application that makes it useless.
While Volume, Velocity, and Variety are focused on collection of data, Gartner, in 2014, introduced 3 additional Vs: Veracity, Variability, and Value which focus on usefulness of the data.
- Veracity: Uncertainty due to data inconsistency and incompleteness, ambiguities, latency, deception, model approximations, accuracy, quality, truthfulness or trustworthiness
- Variability: The differing ways in which the data may be interpreted, different questions require different interpretations
- Value: Data for co-creation and deep learning
I believe that perfecting as few as 5% of the relevant variables will get a business 95% of the same benefit. The trick is identifying that viable 5%, and extracting meaningful information from it. In other words, “Value” is the long pole in the tent.
Big data is receiving a lot of press these days including from this author. While there continues to be constructive dialog regarding whether volume, velocity, or variety are the most important attributes of big data movement, one thing is clear. Constructed correctly, big data has the potential to transform businesses by increasing sales and operational efficiencies. More importantly, when big data is combined with predictive analytics, big data can improve customer experience, enable better targeting of potential customers, and improve the core business capabilities that are foundational to a business’s right to win.
The problem many in the vanguard have discovered is their big data projects are fraught with risk if they are not built upon a solid data management foundation. During the Big Data Summit, you will learn directly for the vanguard of big data. How have they successfully transition from the traditional world of data management to a new world of big data analytics. Hear from market leading enterprises like Johnson and Johnson, Transamerica, Devon Energy, KPN, and Western Union. As well, hear from Tom Davenport, Distinguished Professor in Management and Information Technology at Babson College and the bestselling author of “Competing on Analytics” and “Big Data at Work”. Tom will share in particular his perspective from interviewing hundreds of companies about the successes and failures of their big data initiatives. Tom Davenport initially thought big data was just another example of technology hype. But his research on big data changed his mind. And finally hear from big data thought leaders including Cloudera, Hortonworks, Cognizant, and Capgemini. They are all here to share their stories on how to avoid common pitfalls and accelerate your analytical returns in a big data world.
To attend in person, please join us on Tuesday the 12th at 1:30 in Las Vegas at the Big Data Summit. If you cannot join us in person, I will be share live tweets and videos through twitter starting at 1:30 PST. Look for me at @MylesSuer on twitter to follow along.
What is Big Data and why should your business care?
Big Data: Does the emperor have their clothes on?
Should We Still be calling it Big Data?
CIO explains the importance of Big Data to healthcare
Big Data implementations need a systems view and to put in place trustworthy data.
The state of predictive analytics
Analytics should be built upon Business Strategy
Analytics requires continuous improvement too?
When should you acquire a data scientist or two?
Who Owns Enterprise Analytics and Data?
Competing on Analytics: A Follow Up to Thomas H. Davenport’s Post in HBR
Thomas Davenport Book “Competing On Analytics”
Analytics Stories: A Banking Case Study
Analytics Stories: A Financial Services Case Study
Analytics Stories: A Healthcare Case Study
Author Twitter: @MylesSuer
The unintended consequences of big data, real time data and loose data security all showed up for the Twitter Q1 earnings release this week and showed how not having good control of data can cause some bad things to happen.
What happens when information is released because it is accidentally made public? In this case the damage might seem small, only 18% shaved off the share price of Twitter, but this amounts to about a $5 billion dollar drop in valuation and millions changing hands for some investors. Given earnings were a miss maybe the drop in stock value would have happened anyway but the surprise element may have increased the impact of the news.
This is a great example to anyone wondering about data security and what it means to properly manage public or private data. This episode will pass but it will leave a blemish on Twitter’s reputation and should lead them to take a closer look at how they are managing data that is accessible publicly or meant to be secured.
In this case the data leak was caused when a company, Selerity, who provides real time content analytics that specializes in financial market data and sentiment, picked up the Twitter earnings release in a PDF posted on their public investor relations website with one of their web crawlers about an hour before Twitter pushed out the earnings release. They simply reported the information that was on a hidden URL but public information since it was not secured. (A great lesson for many non-technical marketing and PR people)
My favorite quote from Selerity comes from The Verge. “Any time a company’s earnings are due for release, we check the website periodically to see if the earnings are available. In this instance, I am assuming that Twitter mistakenly posted the earnings to the website early. But they did make the earnings available on the website.”
Earnings have been accidentally released many times before and this will not be the last. The good news is there are any number of simple ways stop or control data so it is not accidentally or purposely made publicly accessible by both using data publishing best practices as well as data management and security products. This is a good reminder to companies to review their data publishing, management and security practices and policies. Do nothing and your company could be the next one being talked about.
While CIOs are urged to rethink of backup strategies following warnings from leading analysts that companies are wasting billions on unnecessary storage, consultants and IT solution vendors are selling “Big Data” narratives to these CIOs as a storage optimization strategy.
What a CIO must do is ask:
Do you think a Backup Strategy is same as a Big Data strategy?
Is your MO – “I must invest in Big Data because my competitor is”?
Do you think Big Data and “data analysis” are synonyms?
Most companies invest very little in their storage technologies, while spending on server and network technologies primarily for backup. Further, the most common mistake businesses make is to fail to update their backup policies. It is not unusual for companies to be using backup policies that are years or even decades old, which do not discriminate between business-critical files and the personal music files of employees.
Web giants like Facebook and Yahoo generally aren’t dealing with Big Data. They run their own giant, in-house “clusters” – collections of powerful servers – for crunching data. But, it appears that those clusters are unnecessary for many of the tasks which they’re handed. In the case of Facebook, most of the jobs engineers ask their clusters to perform are in the “megabyte to gigabyte” range, which means they could easily be handled on a single computer – even a laptop.
The necessity of breaking problems into many small parts, and processing each on a large array of computers, characterizes classic Big Data problems like Google’s need to compute the rank of every single web page on the planet.
In, Nobody ever got fired for buying a cluster, Microsoft Research points out that a lot of the problems solved by engineers at even the most data-hungry firms don’t need to be run on clusters. Why is that a problem? It is because, there are vast classes of problems for which these clusters are relatively inefficient, or a very inappropriate, solution.
Here is an example of a post exhorting readers to “Incorporate Big Data Into Your Small Business” that is about a quantity of data that probably wouldn’t strain Google Docs, much less Excel on a single laptop. In other words, most businesses are in dealing with small data. It’s very important stuff but it has little connection to the big kind.
Let us lose the habit of putting “big” in front of data to make it sound important. After all, supersizing your data, just because you can, is going to cost you a lot more and may yield a lot less.
So what is it? Big Data, small Data, or Smart Data?
Gregor Mendel uncovered the secrets of genetic inheritance with just enough data to fill a notebook. The important thing is gathering the right data, not gathering some arbitrary quantity of it.