Category Archives: Big Data

(Re)Thinking Data Security Strategy

Data Security Strategy

Rethinking Data Security Strategy

Data security is usually something people only think about when they get hacked, a place they do business with gets hacked, or they lose their credit card or wallet. It is just human nature to not worry about things that you cannot see and that seem to be well at hand. Instead I would suggest every company (and person) take just 15 minutes once a month to think about the below items that need to be part of their data security strategy.

Data security is a complex issue with many facets. I will skip past how you create and use passwords as that is the one area that gets a lot of focus. With the now well accepted use of SaaS and cloud based technologies by companies and people in their personal lives it is also time that people take a few moments to consider just how their data is secured or in some cases at risk.

Data centric security. Traditionally enterprise security has focused on access issues. What can be accessed from where and by who. The problem with this often walled garden approach is that when it comes to data these technologies and procedures do not take into account the common use cases of data usage. Most data security programs are also really outdated in a world where the majority of companies are using systems they do not own or directly manage (e.g. SaaS, Cloud, Mobile) or all the different types of data that are being created by people, systems and applications. Many enterprise security strategies need to move from focusing on access to include data usage and the ontology of data being used.

Question: Does your company have a modern enterprise security strategy or a walled garden approach?

Data about data. Long ago to make it easier to store, search and retrieve data people figured out that adding descriptive information about what is in the data file would be useful. Metadata is the actual term and it is no different than the labels people would put on a file to hold papers before we started moving everything to software based storage. The problem is that metadata has really grown and it can provide ways for people to learn a lot of personal, business and proprietary information without even getting access to the underlying information file. The richer the meta-data the more business or personal risk is created by possibly exposing information without actually exposing the underlying data.

Question: Are you accidentally exposing sensitive information in your metadata?

At rest data. The reason they use to say keep your tax records for 3 years and then destroy them is because people stored everything in file cabinets, drawers, or under a mattress. Some people do still like physical records but for most people and companies data is stored electronically and has been for a long time. The addition of SaaS and cloud based solutions adds a new wrinkle because the data is stored somewhere that you do not necessarily have direct access. And in many cases the data is stored multiple times if it is archived or backed up. Even when data is deleted in many cases it is not really gone because with the right technology data can be recovered if it was not fully deleted off the storage system that was used.

Question: Do you know where your data is stored? Archived? Backed up?

Question: Do you know how you would dispose of sensitive data that is no longer needed?

In flight data. No, this is not the Wi-Fi on the airplane. This is literally the data and meta-data that as they are being used by applications in the regular course of business. The issue is that while the data is being transmitted it could be at risk. This is one reason that people are warned to be careful of how they use public Wi-Fi because any decent hacker can see all the data on the network. (yes, really is that easy). Another enterprise issue that often needs to be dealt with is data cleaning in order to reduce duplicates or errors in data. A problem that occurs is how to do this with sensitive data that you do not want the developers or IT staff actually seeing. (e.g. HR or financial records).

Question: How does your company safe guard transactional and in flight data?

Question: Does your company use data masking and cleansing technology to safe guard in flight data?

Data Security Strategy

Rethinking Data Security Strategy

Data. Yes, the actual data or information that you care about or just store because it is so easy. I would recommend that companies look holistically at their data and think of it across it’s lifecycle. In this approach the data risks should be identified for how it is stored, used, transmitted, exposed internally or externally, and integrated or accessed for data integration. There are some new and interesting solutions coming to market that go beyond traditional data security, masking, and cleansing to help identify and access data security risks in the area of Security Intelligence. The concepts of Security Intelligence are solutions that are meant to create a measurement of security risk and identify issues so that they can a) be addressed before becoming a big problem b) automated procedures can be put in place to improve the level of security or bring solution up to the desired level of security .

One example is a new solution from Informatica called Secure@Source, which is just coming to market. This is a solution that is meant to provide automated analysis for enterprises so they can determine data risks so they can make improvements and then put in place new policies and automated procedures so that the desired level of data security is maintained. There have been similar solutions used for network security for years but these newer solutions while using similar approaches are now dealing with the more specific issues of data security.

Question: What is your company doing to proactively assess and manage data risk? Are you a good candidate for a security intelligence approach?

Data security is an important issue that all companies should have a strategy. While this is not meant to be an all encompassing list it is a good starting place for a discussion. Stay secure. Don’t be the next company in the news with a data security issue.

Posted in Architects, Big Data, Business Impact / Benefits, Cloud Computing, Data Governance, Data Integration, Enterprise Data Management, Master Data Management | Tagged , | Leave a comment

Great Data Drives Profits

Great data drives profits

great data

Great Data Drives Profits

We live in an age where technology is intertwined with our personal and business lives. Our machines, devices,  and activities on social networks and the web are generating an explosion of data. It is this trove of big data that most executives in commercial or public enterprises believe hold the keys to untold insights. In fact, a study from Economist Intelligence Unit, reports that some 97% of executives believe that unlocking the value from this data is a strategic priority.

This is confirmed strongly in the CIO’s annual survey in 2014 with 72% of respondents stating “data use” as their top priority. To fully grasp the importance of this data, you need to consider what this data potentially represents for the business. And that requires a little journey back in our computing history.

Back-office and front-office automation

The first generations of software were focused primarily on automating back-office functions such as finance, manufacturing, and procurement. Data was pretty much reflected an event or, more accurately, a transaction.

The same was generally the case for the next wave of front-office applications, such as CRM. Data remained a by-product and was analyzed from time to time to provide trends and insights to optimize business processes. The value though was as it helped organizations to achieve the most profitable outcomes. In this age of productivity, we used it to analyze where we sold best, what products we sold most of, what was the most profitable region, division, region, etc.

Enter the age of engagement

The last decade has given rise to new technologies and applications that operated in a different fashion. Social platforms like Facebook or mobile phone data no longer represent transactions or events only, but reflect actual records and behaviors. It shows what you like or dislike – where you are – who are the members of your household or circle of friends. Most people refer to this as data.

This engagement data has potentially massive value when you can combine it with the traditional data landscape. If you are interacting with a company (e.g., shopping on a web site), it can feed you relevant pieces of information at the right time, shaping the outcome in a direction they desire. For example, if you shop on Amazon, you likely start with a regular search. But then things change compared to a normal Google search. Suddenly you get recommendations, reviews, alternative products, add-on products, special offers. And every time you click on a link, ensuing recommendations become more personalized and targeted, with new suggestions and offers designed to lead you toward a desired outcome – in their case a purchase.

At the heart of this experience is data — engagement data has become the fuel to deliver the most engaging experience and nudge the user towards a desired outcome. Of course this is not limited to retail, but is applied in every industry.

Companies are racing to unlock the value from the mass of data they collect in order to build new data-centric products and start fueling their own customer engagements. And to do so faster and better than the competition. But the “Data Directive” study makes a sobering statement in this regard: Only 12% of the executives thought they were highly effective at this and 15% thought they were better than the competition.

The imperative to design for great data

The reason many companies fall short on their strategic intent with data is locked up in our historical approach to data. Data was never the centerpiece of our thinking. The applications were. The relative few applications representing our back-office and even front-office applications looked pretty similar with similar data structures and concepts. But to fuel these types of engagements when they happen requires different thinking and potentially different technologies. The most fundamental aspect is how good the data needs to be to be successful.

Let’s consider two examples:

  1. The Google self-driving car has the same functional systems as a normal car to turn, brake, and accelerate. However, the thing that makes the care drive and steer in a (hopefully) proper manner is data — engagement data — data from the navigation system, sensors, and cameras. How accurate does this data need to be though?
  2. And revisiting our Amazon example, what are the chances of you selecting the recommended product if you already purchased that product last week when you were on the site?

There is an imperative for great data: To be able to design our environment in such a way that we can deliver great data continuously to any place or person that needs it. Great data is simply data that is clean, safe, and connected.

  • Clean: Data that is accurate, de-duplicated, timely, and complete.
  • Safe: Data that is treated with the appropriate level of security and confidentiality based on the policies regulating your business or industry.
  • Connected: Data that links all the silos to reflect the whole truth consistently.

The challenge for most companies is that they can deliver clean, safe, and connected data for a single project at a time through the heroic efforts of scare IT resources. And this process is restarted every time a new request is received.

More often than not, each new project is executed by different teams, using different technologies resulting in poor ROI and slow time–to-value. If we are going to broadly unlock the value in our data to drive the next wave of profits, it is time to think of systematic ways to deliver great data to every process, person, or application. Or risk becoming one of the 85% of companies that lag their competitors in the ability to unlock the strategic value of data.

Some parting thoughts

Consider the following three practical themes as you develop your own great data strategy:

  1. Designing a system of integration that can continuously deliver great data requires you standardize across all your uses cases, all your projects, and all data types — whether on-premise or in the cloud or both.
  2. Establishing end-to-end metadata that is shared across your entire data system. If our data tells us about our business, metadata tells us about our data system.
  3. Designing for self-service and business-IT collaboration. Data is the responsibility of everyone in the company – not just IT.

For more on the imperative of putting data at the center of your organization’s future efforts, read the ebook, Data Drives Profit.

Posted in Big Data | Leave a comment

The Internet of Things, So Mom Can Understand

Dear Mom,


The Internet of Things, So Mom Can Understand

It’s great to hear that you’re feeling pretty hip at your ability to explain data, metadata and Big Data to your friends. I did get your panicked voicemail asking about the Internet of Things (IoT).   Yes, you’re right that the Internet itself is a thing, so it’s confusing to know if the Internet is one of the things that the IoT includes.   So let’s make it a bit simper so you can feel more comfortable about the topic at your bridge game next week. (Still shocked you’re talking about data with your friends– Dad complains you only talk with him about “Dancing with the Stars”).

First let’s describe the Internet itself.   You use it everyday when you do a Google search: it’s a publicly accessible network of computer systems from around the world.     Someday the Internet will hopefully allow for sharing of all human knowledge with one another.

But what about knowledge that’s not “human”?   It’s not only people that create information.   The “things” in IoT are the devices and machines that people, companies and governments rely upon every day.  Believe it or not, there are billions of devices today that are also “connected” – meaning they have the ability to send information to other computers.  In fact, the technology research firm Gartner says there will be 4.9 billion connected devices, or “things” in 2015.  And by 2020, that number will reach 25 billion!

Some of these connected devices are more obvious than others – and some have been around a very long time.  For example, you use ATMs all the time at the bank, when you’re withdrawing and depositing money.   Clearly those machines can only access your account information if they’re connected to the banks computer systems that hold your account information.

Your iPhone, your “smart” thermostat, and your washing machine with the call home feature are all connected “Things” too.  Mom, imagine waking up in the morning and having your coffee brewed already. The coffee machine brewed because it knew from your alarm clock what time you were waking up.

IoT will also help make sure your oven is off; your lost keys can be easily found and your fridge can check how many eggs are left while you’re standing in the grocery store. The possibilities are limitless.

And it doesn’t stop there.   Medical devices are communicating with doctors, jet engines are communicating with their manufacturers, and parking meters are communicating with city services.

And guess what?  There are people watching the computers that collect all of that information, and they’re working hard to figure out what value they can deliver by using it effectively.   It’s actually this machine data that’s going to make Big Data REALLY Big.

So does that mean your espresso maker, your cell phone and your car will be conspiring to take over the house?  Probably not something we need to worry about this year (Maybe you want to keep an eye on the refrigerator just in case).   But in the short term, it will mean people like me who have dedicated our careers to data management will have our work cut out for ourselves trying to figure out how to make sense of all of this new machine interaction data from devices. Then how to marry it with the people interaction data from social networks like Facebook, LinkedIn and Twitter. And then how to marry all of that with all of the transactional data that companies capture during the normal course of business.    As I’ve said before, data’s a good business to be in!

Happy Mother’s Day!

Love, Rob

Posted in Big Data | Tagged , , | 1 Comment

Big Data’s Next Big Frontier: Earthquake Prediction

big data

Big Data’s Next Big Frontier

There are lots of really fascinating applications coming out the big data space as of late, and I recently came across one that really may be the coolest of the coolest. There’s a UK-based firm that is employing big data to help predict earthquakes.

Unfortunately, predicting earthquakes thus far has been almost impossible. Imagine if people living in an earthquake zone could get at least several hours’ notice, maybe even several days, just as those in the paths of hurricanes can advanced warning and can flee or prepare. Hurricane and storm modeling is one of the earliest examples of big data in action, going back decades. The big data revolution may now be on the verge of earthquake prediction modeling as well.

Bernard Marr, in a recent Forbes post, explains how Terra Seismic employs satellite data to sense impending shakers:

“The systems use data from US, European and Asian satellite services, as well as ground based instruments, to measure abnormalities in the atmosphere caused by the release of energy and the release of gases, which are often detectable well before the physical quake happens. Large volumes of satellite data are taken each day from regions where seismic activity is ongoing or seems imminent. Custom algorithms analyze the satellite images and sensor data to extrapolate risk, based on historical facts of which combinations of circumstances have previously led to dangerous quakes.”

So far, Marr reports, Terra Seismic has been able to predict major earthquakes anywhere in the world with 90% accuracy. Among them is a prediction, issued on February 22nd, that a 6.5-magnitude quake would hit the Indonesian island of Sumatra. The island was hit by a 6.4-magnitude quake on March 3rd.

There’s no question that the ability to accurately forecast earthquakes – at least as closely as hurricanes and major blizzards can be predicted – will not only save many human lives, but also be invaluable to government agencies and businesses as well.

At the same time, such creative – and potentially and game-changing – applications of big data provide very graphic examples of how data is converted to insights that were never possible before. Many business leaders are looking for ways to shine a light on potential events within their organizations and markets, and examples such as Terra Seismic accentuate the positive benefits big data can deliver.

Terra Seismic’s forecasts are available through a public website:

Posted in Big Data | Tagged , | Leave a comment

Informatica Wins Cisco’s 2014 ISV Partner of The Year – America’s

Partners play an instrumental role in Informatica’s business and have for many years. But there are some years when unique partnerships truly blossom and both companies come together to do really special things together that could not have been conceived of by themselves. And that is the case in 2015 with our partnership with Cisco.

On April 28, Informatica was awarded ISV Partner of the Year for the America’s by Cisco in Montreal, Canada.

This year the Cisco  award was given for two solutions that were jointly created.  The first solution is an end-to-end Data Warehouse Optimization (DWO) solution (ADD LINK).  By combining Cisco UCS (Unified Computing System) along with Hadoop, Informatica Big Data Edition (BDE) and Cisco Data virtualization a customer gains access to a powerful next generation big data analytics platform for both structured and unstructured data.

This solution was created in order to help customers reduce both CAPEX and OPEX IT expenditures with regards to their Enterprise Data Warehouse increasing costs.  By offloading infrequently used or dark data along with ETL (extract, transform and load) jobs and mappings into the Data Lake / Hadoop a customer can recognize a 5-10X cost reduction.

The second solution that was recognized by Cisco was a jointly created Internet of Things (IoT) / Internet of Everything (IoE) offering (ADD LINK).  With the explosion of sensor, social and internet-based data the two companies recognized the need to create a solution that would incorporate data from “the edge” into a customer’s mainstream data repository (EDW, BD, Hadoop, etc.).

This solution includes Cisco routers and hardened devices to collect sensor data (i.e. Telemetry Data) coupled with Informatica’s real-time data ingestion and analytics capabilities.  By combining these technologies a customer is able to aggregate data from every source where they capture data allowing them to gain a 360 view of their business for competitive advantage.

From our announcement in February, “The Data Warehouse Optimization solution is about enabling organizations to more easily leverage all their data assets -current and historical, transaction and interaction – for more effective analytics while reducing their data management costs,” said Mike Flannagan, vice president and general manager of Data and Analytics at Cisco. “More than the sum of its parts, the solution’s Cisco and Informatica elements work synergistically to meet the demands of big data, to respond quickly to changing information needs, and to deliver insights that drive increased competitiveness sand business innovation.”

Every organization on the planet is working hard to gather, analyze and make decisions based on their data. The joint Informatica and Cisco solutions are critical to helping customers to become Data Ready enterprises today and for years to come.

Moving forward, Cisco and Informatica will continue to collaborate on how to best build and deliver on premise, cloud-based and hybrid solutions so that customers have best-of-breed solutions to solve their exploding data and complex IT problems.

Posted in Big Data | Tagged , | Leave a comment

There is Just One V in Big Data

According to Gartner, 64% of organizations surveyed have purchased or were planning to invest in Big Data systems. More and more companies are diving into their data, trying to put it to use to minimize customer churn, analyze financial risk, and improve the customer experience.

Of that 64%, 30% have already invested in Big Data technology, 19% plan to invest within the next year, and another 15% plan to invest within two years. Less than 8% of Gartner’s 720 respondents, however, have actually deployed Big Data technology. This is bad, because most companies simply don’t know what they’re doing when it comes to Big Data.

Over the years, we have heard that Big Data is Volume, Velocity, and Variety. I feel this is one of the reasons why despite the Big Data hype, most companies are still stuck in neutral is because of this limited view.

  1. Volume: Terabytes to Exabytes, petabytes to Zetabytes of lots of data
  1. Velocity: Streaming data, milliseconds to seconds, how fast data is produced, and how fast the data must be processed to meet the need or demand
  1. Variety: Structured, unstructured, text, multimedia, video, audio, sensor data, meter data, html, text, e-mails, etc.

There is just one V in Big DataFor us, the focus is on collection of data. After all, we are prone to be hoarders. Wired by our survival extinct to collect and hoard for the leaner winter months that may come. So while we hoard data, as much as we can, for the illusive “What if?” scenario. “Maybe this will be useful someday.” It’s this stockpiling of Big Data without application that makes it useless.

While Volume, Velocity, and Variety are focused on collection of data, Gartner, in 2014, introduced 3 additional Vs: Veracity, Variability, and Value which focus on usefulness of the data.

  1. Veracity: Uncertainty due to data inconsistency and incompleteness, ambiguities, latency, deception, model approximations, accuracy, quality, truthfulness or trustworthiness
  1. Variability: The differing ways in which the data may be interpreted, different questions require different interpretations
  1. Value: Data for co-creation and deep learning

I believe that perfecting as few as 5% of the relevant variables will get a business 95% of the same benefit. The trick is identifying that viable 5%, and extracting meaningful information from it. In other words, “Value” is the long pole in the tent.

Twitter @bigdatabeat

Posted in Big Data, Business Impact / Benefits, Business/IT Collaboration, CIO, Hadoop | Tagged , | 1 Comment

Succeeding with Analytics in a Big Data World

shutterstock_227687962 (1) - CopyBig data is receiving a lot of press these days including from this author.  While there continues to be constructive dialog regarding whether volume, velocity, or variety are the most important attributes of big data movement, one thing is clear. Constructed correctly, big data has the potential to transform businesses by increasing sales and operational efficiencies. More importantly, when big data is combined with predictive analytics, big data can improve customer experience, enable better targeting of potential customers, and improve the core business capabilities that are foundational to a business’s right to win.

The problem many in the vanguard have discovered is their big data projects are fraught with risk if they are not built upon a solid data management foundation.  During the Big Data Summit, you will learn directly for the vanguard of big data. How have they successfully transition from the traditional world of data management to a new world of big data analytics. Hear from market leading enterprises like Johnson and Johnson, Transamerica, Devon Energy, KPN, and Western Union. As well, hear from Tom Davenport, Distinguished Professor in Management and Information Technology at Babson College and the bestselling author of “Competing on Analytics” and “Big Data at Work”. Tom will share in particular his perspective from interviewing hundreds of companies about the successes and failures of their big data initiatives. Tom Davenport initially thought big data was just another example of technology hype. But his research on big data changed his mind. And finally hear from big data thought leaders including Cloudera, Hortonworks, Cognizant, and Capgemini. They are all here to share their stories on how to avoid common pitfalls and accelerate your analytical returns in a big data world.

To attend in person, please join us on Tuesday the 12th at 1:30 in Las Vegas at the Big Data Summit. If you cannot join us in person, I will be share live tweets and videos through twitter starting at 1:30 PST. Look for me at @MylesSuer on twitter to follow along.

Related Blogs

What is Big Data and why should your business care?
Big Data: Does the emperor have their clothes on?
Should We Still be calling it Big Data?
CIO explains the importance of Big Data to healthcare
Big Data implementations need a systems view and to put in place trustworthy data.
The state of predictive analytics
Analytics should be built upon Business Strategy
Analytics requires continuous improvement too?
When should you acquire a data scientist or two?
Who Owns Enterprise Analytics and Data?
Competing on Analytics: A Follow Up to Thomas H. Davenport’s Post in HBR
Thomas Davenport Book “Competing On Analytics”
Analytics Stories: A Banking Case Study
Analytics Stories: A Financial Services Case Study
Analytics Stories: A Healthcare Case Study

Author Twitter: @MylesSuer


Posted in 5 Sales Plays, Big Data, CIO, Informatica World 2015 | Tagged , , | Leave a comment

Big Data Crashes Twitter Earnings

twitter-fail-whale- Big data

Unmanaged Data may Result in Loss

The unintended consequences of big data, real time data and loose data security all showed up for the Twitter Q1 earnings release this week and showed how not having good control of data can cause some bad things to happen.

What happens when information is released because it is accidentally made public? In this case the damage might seem small, only 18% shaved off the share price of Twitter, but this amounts to about a $5 billion dollar drop in valuation and millions changing hands for some investors. Given earnings were a miss maybe the drop in stock value would have happened anyway but the surprise element may have increased the impact of the news.

This is a great example to anyone wondering about data security and what it means to properly manage public or private data. This episode will pass but it will leave a blemish on Twitter’s reputation and should lead them to take a closer look at how they are managing data that is accessible publicly or meant to be secured.

In this case the data leak was caused when a company, Selerity, who provides real time content analytics that specializes in financial market data and sentiment, picked up the Twitter earnings release in a PDF posted on their public investor relations website with one of their web crawlers about an hour before Twitter pushed out the earnings release. They simply reported the information that was on a hidden URL but public information since it was not secured. (A great lesson for many non-technical marketing and PR people)

My favorite quote from Selerity comes from The Verge. “Any time a company’s earnings are due for release, we check the website periodically to see if the earnings are available. In this instance, I am assuming that Twitter mistakenly posted the earnings to the website early. But they did make the earnings available on the website.”

Earnings have been accidentally released many times before and this will not be the last. The good news is there are any number of simple ways stop or control data so it is not accidentally or purposely made publicly accessible by both using data publishing best practices as well as data management and security products. This is a good reminder to companies to review their data publishing, management and security practices and policies. Do nothing and your company could be the next one being talked about.

Posted in Big Data | Tagged , | Leave a comment

How Do You Know if Your Business is Not Wasting Money on Big Data?


Smart Big Data Strategy

While CIOs are urged to rethink of backup strategies following warnings from leading analysts that companies are wasting billions on unnecessary storage, consultants and IT solution vendors are selling “Big Data” narratives to these CIOs as a storage optimization strategy.


What a CIO must do is ask:

Do you think a Backup Strategy is same as a Big Data strategy?

Is your MO – “I must invest in Big Data because my competitor is”?

Do you think Big Data and “data analysis” are synonyms?

Most companies invest very little in their storage technologies, while spending on server and network technologies primarily for backup. Further, the most common mistake businesses make is to fail to update their backup policies. It is not unusual for companies to be using backup policies that are years or even decades old, which do not discriminate between business-critical files and the personal music files of employees.

Web giants like Facebook and Yahoo generally aren’t dealing with Big Data. They run their own giant, in-house “clusters” – collections of powerful servers – for crunching data. But, it appears that those clusters are unnecessary for many of the tasks which they’re handed. In the case of Facebook, most of the jobs engineers ask their clusters to perform are in the “megabyte to gigabyte” range, which means they could easily be handled on a single computer – even a laptop.

The necessity of breaking problems into many small parts, and processing each on a large array of computers, characterizes classic Big Data problems like Google’s need to compute the rank of every single web page on the planet.

In, Nobody ever got fired for buying a cluster, Microsoft Research points out that a lot of the problems solved by engineers at even the most data-hungry firms don’t need to be run on clusters. Why is that a problem? It is because, there are vast classes of problems for which these clusters are relatively inefficient, or a very inappropriate, solution.

Here is an example of a post exhorting readers to “Incorporate Big Data Into Your Small Business” that is about a quantity of data that probably wouldn’t strain Google Docs, much less Excel on a single laptop. In other words, most businesses are in dealing with small data. It’s very important stuff but it has little connection to the big kind.

Let us lose the habit of putting “big” in front of data to make it sound important. After all, supersizing your data, just because you can, is going to cost you a lot more and may yield a lot less.

So what is it? Big Data, small Data, or Smart Data?

Gregor Mendel uncovered the secrets of genetic inheritance with just enough data to fill a notebook. The important thing is gathering the right data, not gathering some arbitrary quantity of it.

Twitter @bigdatabeat

Posted in Big Data | Tagged , , | Leave a comment