Category Archives: Big Data
Are data lakes a good thing?
This was the debate going back and forth at the recent Data Summit, held in New York. Interestingly, the roster of speakers – representing a range of industry experts – was sharply divided on the value of data lakes to enterprises. Some saw data lakes — central repositories of raw data that is simply collected, and structured and processed at a later time when needed by an application – as risky business, while others regarded them as the logical way to make the most of the big data tsunami.
In keynote panel discussion early in the conference, Miles Kehoe, search evangelist at Avalon Consulting, and Anne Buff, business solutions manager at the SAS Institute, expressed caution about data lakes. Buff, for one, said data lakes were great technology tools, but didn’t make sense for the business. “I argue vehemently against it,” she said. “Not because it isn’t valuable. From an analytics standpoint it’s a great playground or sandbox because it’s this utopia of putting our data in one place and make it naked, make it raw so we could do whatever we want with it,” Buff said. “But that’s the biggest risk you could imagine. Let’s put every piece of data we ever had in our company in one place, and tell everybody about it.”
The problem, Buff continued, was data security and privacy. The only insurance against this is if an organization has a “good data governance program where people respect data,” as well as certifying individuals. However, Buff continued, such best practices are not very common in enterprises. She referred to the notion of secure data lakes as a “utopic belief that we can get all data in one place.”
Kehoe agreed with Buff, comparing the idea of the data lake to the “x” drive that was part of earlier PC networks. He cautions that organizations may not have enough control over the content of the data being stored within a data lake. “You’re putting stuff there, and you don’t know what it is,” he said. “You may have things that expose you to sexual harassment lawsuits, for example. Can you imagine people copying their files, and shoving it up to a file share somewhere, so it’s publicly available, with no security.”
Some experts say the idea of having such an x drive, or “a big dumb disk,” is fine. “If that’s how it takes to get data there, by all means, put it on that dumb disk,” said David Mariani, CEO of AtScale.
Mariani joined a panel later that day, which I moderated, that also included Wendy Gradek, senior manager with EMC, and Andy Schroepfer, chief strategy officer at Hosting, both of whom expressed great support for the data lake concept.
Data lakes help address the greatest challenge for many enterprises today is disparate data sources, and the inertia it creates within enterprises, said Gradek. “I don’t know how many times I’ve been told the information we need is six months out, or it’s about a year out. That’s not going to work for the business — their goals are very much weekly driven, especially in sales, where if you don’t make your numbers, and you don’t have visibility into your data, you’re running blind.” The key to resolving this supporting disparate data sources in a single enterprise location, she continued. “We need it to be in a central repository in its original state, so when we have those questions we can go to it and apply the logic as close to query time as possible and get what we need quickly.”
Mariani agreed, noting how he “came to a realization that data movement is evil. Data is like water. It’s very expensive and difficult move once it lands someplace.” Today’s data volumes have “grown beyond our ability to pre-process it or to pre-structure it, to build structures to answer questions today.”
Schroepfer stated that it’s better to have data in one place, “as opposed to distributed data sitting on different peoples’ desktops, sitting in different peoples’ Excel spreadsheets. To me, that’s far worse than having a centralized store where you can lock it down and provide access. It’s as good and as clean as you want it to be.”
(Disclosure: the author is a contributor to Database Trends & Applications, published by Information Today, Inc., host of the Data Summit mentioned above.)
Two weeks ago I had the pleasure to attend the InformaticaWorld 2015 conference in Las Vegas. With more than 2,500 attendees talking about all things data, this was for sure the time to get your data geek out. There was plenty of talk about data governance, intelligent data platforms, big data, predictive and prescriptive analytical processes, and ultra-fast message ingestion from devices on aircraft engines. But in the midst of all of the cool data talk, a CIO from one of our customers put a timely reminder to us data people to “Fisher-Price our insights” – to keep it simple.
From systems of record to systems of engagement to systems of insight
Geoffrey Moore has been the future of IT for years and how our systems have changed from mainly record keeping to those at the front lines of engagement with customers and suppliers. Today we live in a world were nearly every possible interaction or event is digitized in some form through very addictive social media, sensors, and the likes. And this deluge of information is opening up a new world of learning and insight into behaviors ranging from buyer journeys to personalized healthcare. Perhaps we now enter the systems of insight stage as we are putting this data to work to drive better outcomes.
The purpose for generating insights
As a data geek it has always been fun to dig into the data, find the hidden patterns, and build out the algorithms to do my correlations. But our CIO reminded me that for most of the time, our analyses are not serving our own purpose, but that of the business or agency we work for.
I have been working in the data space for 30+ years and have built data warehouses, managed business intelligence and market intelligence teams. And one of the toughest jobs I have found is to distill an insight down to its core essence when you communicate with executives or your board. It is not that they are not smart enough, but that they might not be steeped in the data and methods. Least of all, they don’t have time to pour over the data as you do for hours each day.
A few thoughts on simplifying insights
While there is a sea change in the c-suites and boardrooms all over the world as management gets more data savvy and shift from “knowing it all” to “knowing what questions to ask”, we can help our causes with some simple considerations.
- Charts are better. This seems to be a popular line and in my experience holds true – especially if you could make it a bar chart with a big bar and a small bar.
- Let the data tell a story. Not in a single chart with every possible insight on one slide, but in a sequence of easily digestible charts to illustrate the learnings.
- Use business language. The more you can connect your insight and language to the language the business speaks, the easier to conversation go.
- Review with a stakeholder-coach. You have one shot to get your point across.
- Dumbing it down does not mean remove the insights. Perhaps this does not entirely fit in my list, but asking tough questions and sharing unpopular insights takes courage. Don’t hide the truth behind over-simplifications.
Need for superpowers of the data geeks
For sure the list above can be much longer, but it also illustrates some challenges for data scientists and data geeks alike. The desired skills are not just the super powers of math, stats, and computer science but knowledge of business, psychology, economics and the likes that makes the super powers really super. This or course begs the question what the best education is to train our data scientists. We can hold that discussion for another day.
While it is important to simplify business insights when we present to the executive teams on the business side, it is equally important when we build business cases and justifications for our own data projects in IT. Whether you are the data scientist or the CIO, I believe the advice from our CIO remains equally strong. Keeping it simple and keeping it in small increments will help you justify your project better. Please share your ideas for keeping it simple and how you coach your teams in this process.
For more background on delivering great data to power your business, read our ebook Data Drives Profit.
At Big Data Summit / Hadoop World last month, I was astonished by all the new lingo and sheer volume of words in marketing collateral handed out by vendors. There was so much so, that I pledged to aggressively cut down my words when I write about Big Data. This should be easy, in theory, but as I discovered, quite difficult with all the precise language needed to explain detailed concepts to techies.
So– while doing research on how I might accomplish this, I came across a poignant example, (erroneously attributed to Ernest Hemingway), who was said, after a bet with friends, to be able to deliver an entire novel in only 6 words.
He then wrote on a napkin the following 6 words:
For sale: Baby shoes, never worn
And henceforth collects his winnings from his friends.
After reading that, I was challenged–What if I only had 6 words to describe Big Data or Cloud Computing or Real Time Data Integration for that matter?
After a few futile attempts, out came this <Cloud Computing Delivers Computing Like Utilities>. I then concluded 6 words is fairly difficult but not impossible.
But techies still need more depth (and maybe more fun) –so I thought about haiku– which is Japanese poetry that consists of only 3 verses. The first verse is 5 syllables, the second is 7, then the third is 5 again. More importantly, haiku has both depth and fun.
Using this format, I tried to explain the value of big data computing –this time with a little spark.
That was fun. How about a Haiku on the value proposition of big data real time streaming tools:
Then a pair on what proactive monitoring of big data might do to help a sys admin… with an ominous warning thrown in for good measure:
Then this Haiku advocating why you may be interested in Customer 360 visibility
Since a big part of the value proposition of the Big Data Edition and Vibe Data Stream relates to security and intrusion detection, I thought this Haiku might resonate with folks:
Now that I was getting the hang of it, I thought about what real time customer 360 views might mean to CMOs or CEOs and out popped this:
Then finally just for the heck of it, 2 haiku for Marge Breya, our CMO about what Informatica provides for the world at large (with a little edge to it):
And because Informatica has been making great data, ready for analytics for 20 years, this one is fairly apparent.
So, my conclusion is that while Haiku is not the optimum communications medium for techies, it’s clear to me that tech vendors and thought leaders should strive for simplicity and get to the point more quickly about the value that Big data tools can have for businesses. Let us know how we’re doing.
Meanwhile if you’d like to take a look at the real software and be so inspired, here’s a downloadable version you can try out for free-…. So you too can be inspired to Haiku.
Among those changes: Marketing will take a bigger role in customer experience, directly impacting an organization’s competitiveness. If marketing is driving customer experience, they will need the right technology – with IT as a partner for success.
The trouble is: That partnership is lagging. A 2013 Accenture study found that 90% of CIOs and CMOs do not believe collaboration between their areas is sufficient. It’s time for IT to rapidly align with marketing.
What can IT expect from the marketing side of the shop in the next five years? Three things emerge: The rise of the customer experience, the increase in marketing analytics, and personalized customer messaging.
Put customer experience first. If, as Gartner predicts, all organizations will soon compete primarily on customer experience, marketing needs to align with IT to ensure competitiveness and growth. In one Gartner survey, 50% of respondents said marketing controlled the biggest chunk of the customer experience budget. Driving success by exceeding customer expectations requires a single view of customers and valid data. Many marketing technology tools are available to provide a great customer journey, and IT can find value in understanding and suggesting these tools.
Marketing is more analytical than ever. The perception that marketing dollars don’t yield measurable results is in the past. Technology tools create a wealth of ‘ah-hah’ moments. Marketers question “Why?” at every move. Marketers can measure their activities, so instead of a murky budget with unknown ROI, marketers try new things, measure them, see what is effective, and invest where things work. The reason is technology. Marketers are moving away from branding and creative activities to be data driven. When asked what skills they need to develop, marketers put ‘advertising/branding’ and ‘creative/graphic arts’ at the bottom of the list in a 2015 survey from The Economist.
Messaging is more personalized and segmented. Organizations put a premium on the data they have, but what about the data they could have? Enriching customer profiles with third-party data and marrying that with interactions allows organizations to personalize the experience. Marketing leaders use sophisticated personalization tools for a one-to-one conversation. This drives customer messaging and engagement, while eliminating a messaging strategy based on guesswork. Without this kind of targeting, enabled by technology and accurate data, organizations can’t stay competitive.
The common thread with all of these marketing trends and the need for IT to help achieve them comes back to data. IT can relate to marketing activities through the data they acquire and retain, and soon both the marketing and IT areas will find alignment by being highly focused on the quality of their data.
Co-authored by Thomas Brence, Director of Product Marketing at Informatica.
Great data drives profits
We live in an age where technology is intertwined with our personal and business lives. Our machines, devices, and activities on social networks and the web are generating an explosion of data. It is this trove of big data that most executives in commercial or public enterprises believe hold the keys to untold insights. In fact, a study from Economist Intelligence Unit, reports that some 97% of executives believe that unlocking the value from this data is a strategic priority.
This is confirmed strongly in the CIO’s annual survey in 2014 with 72% of respondents stating “data use” as their top priority. To fully grasp the importance of this data, you need to consider what this data potentially represents for the business. And that requires a little journey back in our computing history.
Back-office and front-office automation
The first generations of software were focused primarily on automating back-office functions such as finance, manufacturing, and procurement. Data was pretty much reflected an event or, more accurately, a transaction.
The same was generally the case for the next wave of front-office applications, such as CRM. Data remained a by-product and was analyzed from time to time to provide trends and insights to optimize business processes. The value though was as it helped organizations to achieve the most profitable outcomes. In this age of productivity, we used it to analyze where we sold best, what products we sold most of, what was the most profitable region, division, region, etc.
Enter the age of engagement
The last decade has given rise to new technologies and applications that operated in a different fashion. Social platforms like Facebook or mobile phone data no longer represent transactions or events only, but reflect actual records and behaviors. It shows what you like or dislike – where you are – who are the members of your household or circle of friends. Most people refer to this as data.
This engagement data has potentially massive value when you can combine it with the traditional data landscape. If you are interacting with a company (e.g., shopping on a web site), it can feed you relevant pieces of information at the right time, shaping the outcome in a direction they desire. For example, if you shop on Amazon, you likely start with a regular search. But then things change compared to a normal Google search. Suddenly you get recommendations, reviews, alternative products, add-on products, special offers. And every time you click on a link, ensuing recommendations become more personalized and targeted, with new suggestions and offers designed to lead you toward a desired outcome – in their case a purchase.
At the heart of this experience is data — engagement data has become the fuel to deliver the most engaging experience and nudge the user towards a desired outcome. Of course this is not limited to retail, but is applied in every industry.
Companies are racing to unlock the value from the mass of data they collect in order to build new data-centric products and start fueling their own customer engagements. And to do so faster and better than the competition. But the “Data Directive” study makes a sobering statement in this regard: Only 12% of the executives thought they were highly effective at this and 15% thought they were better than the competition.
The imperative to design for great data
The reason many companies fall short on their strategic intent with data is locked up in our historical approach to data. Data was never the centerpiece of our thinking. The applications were. The relative few applications representing our back-office and even front-office applications looked pretty similar with similar data structures and concepts. But to fuel these types of engagements when they happen requires different thinking and potentially different technologies. The most fundamental aspect is how good the data needs to be to be successful.
Let’s consider two examples:
- The Google self-driving car has the same functional systems as a normal car to turn, brake, and accelerate. However, the thing that makes the care drive and steer in a (hopefully) proper manner is data — engagement data — data from the navigation system, sensors, and cameras. How accurate does this data need to be though?
- And revisiting our Amazon example, what are the chances of you selecting the recommended product if you already purchased that product last week when you were on the site?
There is an imperative for great data: To be able to design our environment in such a way that we can deliver great data continuously to any place or person that needs it. Great data is simply data that is clean, safe, and connected.
- Clean: Data that is accurate, de-duplicated, timely, and complete.
- Safe: Data that is treated with the appropriate level of security and confidentiality based on the policies regulating your business or industry.
- Connected: Data that links all the silos to reflect the whole truth consistently.
The challenge for most companies is that they can deliver clean, safe, and connected data for a single project at a time through the heroic efforts of scare IT resources. And this process is restarted every time a new request is received.
More often than not, each new project is executed by different teams, using different technologies resulting in poor ROI and slow time–to-value. If we are going to broadly unlock the value in our data to drive the next wave of profits, it is time to think of systematic ways to deliver great data to every process, person, or application. Or risk becoming one of the 85% of companies that lag their competitors in the ability to unlock the strategic value of data.
Some parting thoughts
Consider the following three practical themes as you develop your own great data strategy:
- Designing a system of integration that can continuously deliver great data requires you standardize across all your uses cases, all your projects, and all data types — whether on-premise or in the cloud or both.
- Establishing end-to-end metadata that is shared across your entire data system. If our data tells us about our business, metadata tells us about our data system.
- Designing for self-service and business-IT collaboration. Data is the responsibility of everyone in the company – not just IT.
For more on the imperative of putting data at the center of your organization’s future efforts, read the ebook, Data Drives Profit.
It’s great to hear that you’re feeling pretty hip at your ability to explain data, metadata and Big Data to your friends. I did get your panicked voicemail asking about the Internet of Things (IoT). Yes, you’re right that the Internet itself is a thing, so it’s confusing to know if the Internet is one of the things that the IoT includes. So let’s make it a bit simper so you can feel more comfortable about the topic at your bridge game next week. (Still shocked you’re talking about data with your friends– Dad complains you only talk with him about “Dancing with the Stars”).
First let’s describe the Internet itself. You use it everyday when you do a Google search: it’s a publicly accessible network of computer systems from around the world. Someday the Internet will hopefully allow for sharing of all human knowledge with one another.
But what about knowledge that’s not “human”? It’s not only people that create information. The “things” in IoT are the devices and machines that people, companies and governments rely upon every day. Believe it or not, there are billions of devices today that are also “connected” – meaning they have the ability to send information to other computers. In fact, the technology research firm Gartner says there will be 4.9 billion connected devices, or “things” in 2015. And by 2020, that number will reach 25 billion!
Some of these connected devices are more obvious than others – and some have been around a very long time. For example, you use ATMs all the time at the bank, when you’re withdrawing and depositing money. Clearly those machines can only access your account information if they’re connected to the banks computer systems that hold your account information.
Your iPhone, your “smart” thermostat, and your washing machine with the call home feature are all connected “Things” too. Mom, imagine waking up in the morning and having your coffee brewed already. The coffee machine brewed because it knew from your alarm clock what time you were waking up.
IoT will also help make sure your oven is off; your lost keys can be easily found and your fridge can check how many eggs are left while you’re standing in the grocery store. The possibilities are limitless.
And it doesn’t stop there. Medical devices are communicating with doctors, jet engines are communicating with their manufacturers, and parking meters are communicating with city services.
And guess what? There are people watching the computers that collect all of that information, and they’re working hard to figure out what value they can deliver by using it effectively. It’s actually this machine data that’s going to make Big Data REALLY Big.
So does that mean your espresso maker, your cell phone and your car will be conspiring to take over the house? Probably not something we need to worry about this year (Maybe you want to keep an eye on the refrigerator just in case). But in the short term, it will mean people like me who have dedicated our careers to data management will have our work cut out for ourselves trying to figure out how to make sense of all of this new machine interaction data from devices. Then how to marry it with the people interaction data from social networks like Facebook, LinkedIn and Twitter. And then how to marry all of that with all of the transactional data that companies capture during the normal course of business. As I’ve said before, data’s a good business to be in!
Happy Mother’s Day!
There are lots of really fascinating applications coming out the big data space as of late, and I recently came across one that really may be the coolest of the coolest. There’s a UK-based firm that is employing big data to help predict earthquakes.
Unfortunately, predicting earthquakes thus far has been almost impossible. Imagine if people living in an earthquake zone could get at least several hours’ notice, maybe even several days, just as those in the paths of hurricanes can advanced warning and can flee or prepare. Hurricane and storm modeling is one of the earliest examples of big data in action, going back decades. The big data revolution may now be on the verge of earthquake prediction modeling as well.
Bernard Marr, in a recent Forbes post, explains how Terra Seismic employs satellite data to sense impending shakers:
“The systems use data from US, European and Asian satellite services, as well as ground based instruments, to measure abnormalities in the atmosphere caused by the release of energy and the release of gases, which are often detectable well before the physical quake happens. Large volumes of satellite data are taken each day from regions where seismic activity is ongoing or seems imminent. Custom algorithms analyze the satellite images and sensor data to extrapolate risk, based on historical facts of which combinations of circumstances have previously led to dangerous quakes.”
So far, Marr reports, Terra Seismic has been able to predict major earthquakes anywhere in the world with 90% accuracy. Among them is a prediction, issued on February 22nd, that a 6.5-magnitude quake would hit the Indonesian island of Sumatra. The island was hit by a 6.4-magnitude quake on March 3rd.
There’s no question that the ability to accurately forecast earthquakes – at least as closely as hurricanes and major blizzards can be predicted – will not only save many human lives, but also be invaluable to government agencies and businesses as well.
At the same time, such creative – and potentially and game-changing – applications of big data provide very graphic examples of how data is converted to insights that were never possible before. Many business leaders are looking for ways to shine a light on potential events within their organizations and markets, and examples such as Terra Seismic accentuate the positive benefits big data can deliver.
Terra Seismic’s forecasts are available through a public website: http://quakehunters.com/
Partners play an instrumental role in Informatica’s business and have for many years. But there are some years when unique partnerships truly blossom and both companies come together to do really special things together that could not have been conceived of by themselves. And that is the case in 2015 with our partnership with Cisco.
On April 28, Informatica was awarded ISV Partner of the Year for the America’s by Cisco in Montreal, Canada.
This year the Cisco award was given for two solutions that were jointly created. The first solution is an end-to-end Data Warehouse Optimization (DWO) solution (ADD LINK). By combining Cisco UCS (Unified Computing System) along with Hadoop, Informatica Big Data Edition (BDE) and Cisco Data virtualization a customer gains access to a powerful next generation big data analytics platform for both structured and unstructured data.
This solution was created in order to help customers reduce both CAPEX and OPEX IT expenditures with regards to their Enterprise Data Warehouse increasing costs. By offloading infrequently used or dark data along with ETL (extract, transform and load) jobs and mappings into the Data Lake / Hadoop a customer can recognize a 5-10X cost reduction.
The second solution that was recognized by Cisco was a jointly created Internet of Things (IoT) / Internet of Everything (IoE) offering (ADD LINK). With the explosion of sensor, social and internet-based data the two companies recognized the need to create a solution that would incorporate data from “the edge” into a customer’s mainstream data repository (EDW, BD, Hadoop, etc.).
This solution includes Cisco routers and hardened devices to collect sensor data (i.e. Telemetry Data) coupled with Informatica’s real-time data ingestion and analytics capabilities. By combining these technologies a customer is able to aggregate data from every source where they capture data allowing them to gain a 360 view of their business for competitive advantage.
From our announcement in February, “The Data Warehouse Optimization solution is about enabling organizations to more easily leverage all their data assets -current and historical, transaction and interaction – for more effective analytics while reducing their data management costs,” said Mike Flannagan, vice president and general manager of Data and Analytics at Cisco. “More than the sum of its parts, the solution’s Cisco and Informatica elements work synergistically to meet the demands of big data, to respond quickly to changing information needs, and to deliver insights that drive increased competitiveness sand business innovation.”
Every organization on the planet is working hard to gather, analyze and make decisions based on their data. The joint Informatica and Cisco solutions are critical to helping customers to become Data Ready enterprises today and for years to come.
Moving forward, Cisco and Informatica will continue to collaborate on how to best build and deliver on premise, cloud-based and hybrid solutions so that customers have best-of-breed solutions to solve their exploding data and complex IT problems.