Tag Archives: Big Data
In recent times, the big Internet companies – the Googles, Yahoos and eBays – have proven that it is possible to build a sustainable business on data analytics, in which corporate decisions and actions are being seamlessly guided via an analytics culture, based on data, measurement and quantifiable results. Now, two of the top data analytics thinkers say we are reaching a point that non-tech, non-Internet companies are on their way to becoming analytics-driven organizations in a similar vein, as part of an emerging data economy.
In a report written for the International Institute for Analytics, Thomas Davenport and Jill Dyché divulge the results of their interviews with 20 large organizations, in which they find big data analytics to be well integrated into the decision-making cycle. “Large organizations across industries are joining the data economy,” they observe. “They are not keeping traditional analytics and big data separate, but are combining them to form a new synthesis.”
Davenport and Dyché call this new state of management “Analytics 3.0, ” in which the concept and practices of competing on analytics are no longer confined to data management and IT departments or quants – analytics is embedded into all key organizational processes. That means major, transformative effects for organizations. “There is little doubt that analytics can transform organizations, and the firms that lead the 3.0 charge will seize the most value,” they write.
Analytics 3.0 is the current of three distinct phases in the way data analytics has been applied to business decision making, Davenport and Dyché say. The first two “eras” looked like this:
- Analytics 1.0, prevalent between 1954 and 2009, was based on relatively small and structured data sources from internal corporate sources.
- Analytics 2.0, which arose between 2005 and 2012, saw the rise of the big Web companies – the Googles and Yahoos and eBays – which were leveraging big data stores and employing prescriptive analytics to target customers and shape offerings. This time span was also shaped by a growing interest in competing on analytics, in which data was applied to strategic business decision-making. “However, large companies often confined their analytical efforts to basic information domains like customer or product, that were highly-structured and rarely integrated with other data,” the authors write.
- In the Analytics 3.0 era, analytical efforts are being integrated with other data types, across enterprises.
This emerging environment “combines the best of 1.0 and 2.0—a blend of big data and traditional analytics that yields insights and offerings with speed and impact,” Davenport and Dyché say. The key trait of Analytics 3.0 “is that not only online firms, but virtually any type of firm in any industry, can participate in the data-driven economy. Banks, industrial manufacturers, health care providers, retailers—any company in any industry that is willing to exploit the possibilities—can all develop data-based offerings for customers, as well as supporting internal decisions with big data.”
Davenport and Dyché describe how one major trucking and transportation company has been able to implement low-cost sensors for its trucks, trailers and intermodal containers, which “monitor location, driving behaviors, fuel levels and whether a trailer/container is loaded or empty. The quality of the optimized decisions [the company] makes with the sensor data – dispatching of trucks and containers, for example – is improving substantially, and the company’s use of prescriptive analytics is changing job roles and relationships.”
New technologies and methods are helping enterprises enter the Analytics 3.0 realm, including “a variety of hardware/software architectures, including clustered parallel servers using Hadoop/MapReduce, in-memory analytics, and in-database processing,” the authors adds. “All of these technologies are considerably faster than previous generations of technology for data management and analysis. Analyses that might have taken hours or days in the past can be done in seconds.”
In addition, another key characteristic of big data analytics-driven enterprises is the ability to fail fast – to deliver, with great frequency, partial outputs to project stakeholders. With the rise of new ‘agile’ analytical methods and machine learning techniques, organizations are capable of delivering “insights at a much faster rate,” and provide for “an ongoing sense of urgency.”
Perhaps most importantly, big data and analytics are integrated and embedded into corporate processes across the board. “Models in Analytics 3.0 are often being embedded into operational and decision processes, dramatically increasing their speed and impact,” Davenport and Dyché state. “Some are embedded into fully automated systems based on scoring algorithms or analytics-based rules. Some are built into consumer-oriented products and features. In any case, embedding the analytics into systems and processes not only means greater speed, but also makes it more difficult for decision-makers to avoid using analytics—usually a good thing.”
The report is available here.
Leo Eweani makes the case that the data tsunami is coming. “Businesses are scrambling to respond and spending accordingly. Demand for data analysts is up by 92%; 25% of IT budgets are spent on the data integration projects required to access the value locked up in this data “ore” – it certainly seems that enterprise is doing The Right Thing – but is it?”
Data is exploding within most enterprises. However, most enterprises have no clue how to manage this data effectively. While you would think that an investment in data integration would be an area of focus, many enterprises don’t have a great track record in making data integration work. “Scratch the surface, and it emerges that 83% of IT staff expect there to be no ROI at all on data integration projects and that they are notorious for being late, over-budget and incredibly risky.”
The core message from me is that enterprises need to ‘up their game’ when it comes to data integration. This recommendation is based upon the amount of data growth we’ve already experienced, and will experience in the near future. Indeed, a “data tsunami” is on the horizon, and most enterprises are ill prepared for it.
So, how do you get prepared? While many would say it’s all about buying anything and everything, when it comes to big data technology, the best approach is to splurge on planning. This means defining exactly what data assets are in place now, and will be in place in the future, and how they should or will be leveraged.
To face the forthcoming wave of data, certain planning aspects and questions about data integration rise to the top:
Performance, including data latency. Or, how quickly does the data need to flow from point or points A to point or points B? As the volume of data quickly rises, the data integration engines have got to keep up.
Data security and governance. Or, how will the data be protected both at-rest and in-flight, and how will the data be managed in terms of controls on use and change?
Abstraction, and removing data complexity. Or, how will the enterprise remap and re-purpose key enterprise data that may not currently exist in a well-defined and functional structure?
Integration with cloud-based data. Or, how will the enterprise link existing enterprise data assets with those that exist on remote cloud platforms?
While this may seem like a complex and risky process, think through the problems, leverage the right technology, and you can remove the risk and complexity. The enterprises that seem to fail at data integration do not follow that advice.
I suspect the explosion of data to be the biggest challenge enterprise IT will face in many years. While a few will take advantage of their data, most will struggle, at least initially. Which route will you take?
Maybe the word “death” is a bit strong, so let’s say “demise” instead. Recently I read an article in the Harvard Business Review around how Big Data and Data Scientists will rule the world of the 21st century corporation and how they have to operate for maximum value. The thing I found rather disturbing was that it takes a PhD – probably a few of them – in a variety of math areas to give executives the necessary insight to make better decisions ranging from what product to develop next to who to sell it to and where.
Don’t get me wrong – this is mixed news for any enterprise software firm helping businesses locate, acquire, contextually link, understand and distribute high-quality data. The existence of such a high-value role validates product development but it also limits adoption. It is also great news that data has finally gathered the attention it deserves. But I am starting to ask myself why it always takes individuals with a “one-in-a-million” skill set to add value. What happened to the democratization of software? Why is the design starting point for enterprise software not always similar to B2C applications, like an iPhone app, i.e. simpler is better? Why is it always such a gradual “Cold War” evolution instead of a near-instant French Revolution?
Why do development environments for Big Data not accommodate limited or existing skills but always accommodate the most complex scenarios? Well, the answer could be that the first customers will be very large, very complex organizations with super complex problems, which they were unable to solve so far. If analytical apps have become a self-service proposition for business users, data integration should be as well. So why does access to a lot of fast moving and diverse data require scarce PIG or Cassandra developers to get the data into an analyzable shape and a PhD to query and interpret patterns?
I realize new technologies start with a foundation and as they spread supply will attempt to catch up to create an equilibrium. However, this is about a problem, which has existed for decades in many industries, such as the oil & gas, telecommunication, public and retail sector. Whenever I talk to architects and business leaders in these industries, they chuckle at “Big Data” and tell me “yes, we got that – and by the way, we have been dealing with this reality for a long time”. By now I would have expected that the skill (cost) side of turning data into a meaningful insight would have been driven down more significantly.
Informatica has made a tremendous push in this regard with its “Map Once, Deploy Anywhere” paradigm. I cannot wait to see what’s next – and I just saw something recently that got me very excited. Why you ask? Because at some point I would like to have at least a business-super user pummel terabytes of transaction and interaction data into an environment (Hadoop cluster, in memory DB…) and massage it so that his self-created dashboard gets him/her where (s)he needs to go. This should include concepts like; “where is the data I need for this insight?’, “what is missing and how do I get to that piece in the best way?”, “how do I want it to look to share it?” All that is required should be a semi-experienced knowledge of Excel and PowerPoint to get your hands on advanced Big Data analytics. Don’t you think? Do you believe that this role will disappear as quickly as it has surfaced?
In a previous blog post, I wrote about when business “history” is reported via Business Intelligence (BI) systems, it’s usually too late to make a real difference. In this post, I’m going to talk about how business history becomes much more useful when combined operationally and in real time.
E. P. Thompson, a historian pointed out that all history is the history of unintended consequences. His idea / theory was that history is not always recorded in documents, but instead is ultimately derived from examining cultural meanings as well as the structures of society through hermeneutics (interpretation of texts) semiotics and in many forms and signs of the times, and concludes that history is created by people’s subjectivity and therefore is ultimately represented as they REALLY live.
The same can be extrapolated for businesses. However, the BI systems of today only capture a miniscule piece of the larger pie of knowledge representation that may be gained from things like meetings, videos, sales calls, anecdotal win / loss reports, shadow IT projects, 10Ks and Qs, even company blog posts – the point is; how can you better capture the essence of meaning and perhaps importance out of the everyday non-database events taking place in your company and its activities – in other words, how it REALLY operates.
One of the keys to figuring out how businesses really operate is identifying and utilizing those undocumented RULES that are usually underlying every business. Select company employees, often veterans, know these rules intuitively. If you watch them, and every company has them, they just have a knack for getting projects pushed through the system, or making customers happy, or diagnosing a problem in a short time and with little fanfare. They just know how things work and what needs to be done.
These rules have been, and still are difficult to quantify and apply or “Data-ify” if you will. Certain companies (and hopefully Informatica) will end up being major players in the race to datify these non-traditional rules and events, in addition to helping companies make sense out of big data in a whole new way. But in daydreaming about it, it’s not hard to imagine business systems that will eventually be able to understand the optimization rules of a business, accounting for possible unintended scenarios or consequences, and then apply them in the time when they are most needed. Anyhow, that’s the goal of a new generation of Operational Intelligence systems.
In my final post on the subject, I’ll explain how it works and business problems it solves (in a nutshell). And if I’ve managed to pique your curiosity and you want to hear about Operational Intelligence sooner, tune in to to a webinar we’re having TODAY at 10 AM PST. Here’s the link.
Shhhh… RulePoint Programmer Hard at Work
End of year. Out with the old, in with the new. A time where everyone gets their ducks in order, clears the pipe and gets ready for the New Year. For R&D, one of the gating events driving the New Year is the annual sales kickoff event where we present to Sales the new features so they can better communicate a products’ road map and value to potential buyers. All well and good. But part of the process is to fill out a Q and A that explains the product “Value Prop” and they only gave us 4 lines. I think the answer also helps determine speaking slots and priority.
So here’s the question I had to fill out -
FOR SALES TO UNDERSTAND THE PRODUCT BETTER, WE ASK THAT YOU ANSWER THE FOLLOWING QUESTION:
WHAT IS THE PRODUCT VALUE PROPOSITION AND ARE THERE ANY SIGNIFICANT DEPLOYMENTS OR OTHER CUSTOMER EXPERIENCES YOU HAVE HAD THAT HAVE HELPED TO DEFINE THE PRODUCT OFFERING?
Here’s what I wrote:
Informatica RULEPOINT is a real-time integration and event processing software product that is deployed very innovatively by many businesses and vertical industries. Its value proposition is that it helps large enterprises discover important situations from their droves of data and events and then enables users to take timely action on discovered business opportunities as well as stop problems while or before they happen.
Here’s what I wanted to write:
RulePoint is scalable, low latency, flexible and extensible and was born in the pure and exotic wilds of the Amazon from the minds of natives that have never once spoken out loud – only programmed. RulePoint captures the essence of true wisdom of the greatest sages of yesteryear. It is the programming equivalent and captures what Esperanto linguistically tried to do but failed to accomplish.
As to high availability, (HA) there has never been anything in the history of software as available as RulePoint. Madonna’s availability only pales in comparison to RulePoint’s availability. We are talking 8 Nines cubed and then squared ( ). Oracle = Unavailable. IBM = Unavailable. Informatica RulePoint = Available.
RulePoint works hard, but plays hard too. When not solving those mission critical business problems, RulePoint creates Arias worthy of Grammy nominations. In the wee hours of the AM, RulePoint single-handedly prevented the outbreak and heartbreak of psoriasis in East Angola.
One of the little known benefits of RulePoint is its ability to train the trainer, coach the coach and play the player. Via chalk talks? No, RulePoint uses mind melds instead. Much more effective. RulePoint knows Chuck Norris. How do you think Chuck Norris became so famous in the first place? Yes, RulePoint. Greenpeace used RulePoint to save dozens of whales, 2 narwhal, a polar bear and a few collateral penguins (the bear was about to eat the penguins). RulePoint has been banned in 16 countries because it was TOO effective. “Veni, Vidi, RulePoint Vici” was Julius Caesar’s actual quote.
The inspiration for Gandalf in the Lord of the Rings? RulePoint. IT heads worldwide shudder with pride when they hear the name RulePoint mentioned and know that they acquired it. RulePoint is stirred but never shaken. RulePoint is used to train the Sherpas that help climbers reach the highest of heights. RulePoint cooks Minute rice in 20 seconds.
The running of the bulls in Pamplona every year - What do you think they are running from? Yes, RulePoint. RulePoint put the Vinyasa back into Yoga. In fact, RulePoint will eventually create a new derivative called Full Contact Vinyasa Yoga and it will eventually supplant gymnastics in the 2028 Summer Olympic games.
The laws of physics were disproved last year by RulePoint. RulePoint was drafted in the 9th round by the LA Lakers in the 90s, but opted instead to teach math to inner city youngsters. 5 years ago, RulePoint came up with an antivenin to the Black Mamba and has yet to ask for any form of recompense. RulePoint’s rules bend but never break. The stand-in for the “Mind” in the movie “A Beautiful Mind” was RulePoint.
RulePoint will define a new category for the Turing award and will name it the 2Turing Award. As a bonus, the 2Turing Award will then be modestly won by RulePoint and the whole category will be retired shortly thereafter. RulePoint is… tada… the most interesting software in the world.
But I didn’t get to write any of these true facts and product differentiators on the form. No room.
Hopefully I can still get a primo slot to talk about RulePoint.
And so from all the RulePoint and Emerging Technologies team, including sales and marketing, here’s hoping you have great holiday season and a Happy New Year!
Unlike some of my friends, History was a subject in high school and college that I truly enjoyed. I particularly appreciated biographies of favorite historical figures because it painted a human face and gave meaning and color to the past. I also vowed at that time to navigate my life and future under the principle attributed to Harvard professor Jorge Agustín Nicolás Ruiz de Santayana y Borrás that goes, “Those who cannot remember the past are condemned to repeat it.”
So that’s a little ditty regarding my history regarding history.
Forwarding now to the present in which I have carved out my career in technology, and in particular, enterprise software, I’m afforded a great platform where I talk to lots of IT and business leaders. When I do, I usually ask them, “How are you implementing advanced projects that help the business become more agile or effective or opportunistically proactive?” They usually answer something along the lines of “this is the age and renaissance of data science and analytics” and then end up talking exclusively about their meat and potatoes business intelligence software projects and how 300 reports now run their business.
Then when I probe and hear their answer more in depth, I am once again reminded of THE history quote and think to myself there’s an amusing irony at play here. When I think about the Business Intelligence systems of today, most are designed to “remember” and report on the historical past through large data warehouses of a gazillion transactions, along with basic, but numerous shipping and billing histories and maybe assorted support records.
But when it comes right down to it, business intelligence “history” is still just that. Nothing is really learned and applied right when and where it counted – AND when it would have made all the difference had the company been able to react in time.
So, in essence, by using standalone BI systems as they are designed today, companies are indeed condemned to repeat what they have already learned because they are too late – so the same mistakes will be repeated again and again.
This means the challenge for BI is to reduce latency, measure the pertinent data / sensors / events, and get scalable – extremely scalable and flexible enough to handle the volume and variety of the forthcoming data onslaught.
There’s a part 2 to this story so keep an eye out for my next blog post History Repeats Itself (Part 2)
I’m glad to hear you feel comfortable explaining data to your friends, and I completely understand why you’ll avoid discussing metadata with them. You’re in great company – most business leaders also avoid discussing metadata at all costs! You mentioned during our last call that you keep reading articles in the New York Times about this thing called “Big Data” so as promised I’ll try to explain it as best I can. (more…)
So I missed Strata this year so I can only report back what I heard from my team. I was out on the road talking with customers while the gang was at Strata, talking to customers and prospective customers. That said, the conversations they had with new cool Hadoop companies were and my conversations were quite similar. Lots of talk about trials on Hadoop, but outside of the big internet firms, some startups that are focused on solving “big data” problems and some wall street firms, most companies are still kicking the Hadoop tires.
Which reminds me of a picture my neighbor took of a presentation that he saw on Hadoop. The presenter had a slide with a rehash of an old joke that went something like this (I am paraphrasing here as I don’t have the exact quote):
“Hadoop is a lot like teenage sex. Everyone says they do it, but most are not. And for those who are doing it, most of them aren’t very good at it yet. “
So if you haven’t gotten started on your Hadoop project, don’t worry, you aren’t as far behind as you think.
My wife invited my new neighbors over for dinner this past Saturday night. They are a French couple with a super cute 5 year old son. Dinner was nice, and like most ex-pats in the San Francisco Bay Area, he is in high tech. His company is a successful internet company in Europe, but have had a hard time penetrating the U.S. market which is why they moved to the Bay Area. He is starting up a satellite engineering organization in Palo Alto and he asked me where he can find good “big data” engineers. He is having a hard time finding people.
This is a story that I am hearing quite a bit with customers that I have been talking to as well. They want to start up big data teams, but can’t find enough skilled engineers who understand how to develop in PIG or HIVE or YARN or whatever is coming next in the Hadoop/map reduce world.
This reminds me of when I used to work in the telecom software business 20 years ago and everyone was looking at technologies like DCE and CORBA to build out distributed computing environments to solve complex problems that couldn’t be solved easily on a single computing system. If you don’t know what DCE or CORBA are/were, that’s OK. It is kind of the point. They are distributed computing development platforms that failed because they were too damn hard and there just weren’t enough people who could understand how to use them effectively. Now DCE and CORBA were not trying to solve the same problems as Hadoop, but the basic point still stands, they were damn hard and the reality is that programming on a Hadoop platform is damn hard as well.
So could Hadoop fail, just like CORBA and DCE. I doubt it, for a few key reasons. One… there is a considerable amount of venture and industrial investment going into Hadoop to make it work. Not since Java has there been such a concerted effort by the industry to try to make a new technology successful. Second, much of that investment is in providing graphical development environments and applications that use the storage and compute power of Hadoop, but hide its complexity. That is what Informatica is doing with PowerCenter Big Data Edition. We are making it possible for data integration developers to parse, cleanse, transform and integrate data using Hadoop as the underlying storage and engine. But the developer doesn’t have to know anything about Hadoop. The same thing is happening at the analytics layer, at the data prep layer and at the visualization layer.
Bit by bit, software vendors are hiding the underlying complexity of Hadoop so organizations won’t have to hire an army of big data scientists to solve interesting problems. They will still need a few of them, but not so many that Hadoop will end up like those other technologies that most Hadoop developers have never even heard of.
Power to the elephant. And more later about my dinner guest and his super cute 5 year old son.
Everyone knows that Informatica is the Data Integration company that helps organizations connect their disparate software into a cohesive and synchronous enterprise information system. The value to business is enormous and well documented in the form of use cases, ROI studies and loyalty / renewal rates that are industry-leading.
Event Processing, on the other hand is a technology that has been around only for a few years now and has yet to reach Main Street in Systems City, IT. But if you look at how event processing is being used, it’s amazing that more people haven’t heard about it. The idea at its core (pun intended) is very simple – monitor your data / events – those things that happen on a daily, hourly, minute-ly basis and then look for important patterns that are positive or negative indicators, and then set up your systems to automatically take action when those patterns come up – like notify a sales rep when a pattern indicates a customer is ready to buy, or stop that transaction, your company is about to be defrauded.
Since this is an Informatica blog, then you probably have a decent set of “muscles” in place already and so why, you ask, would you need 6 pack abs? Because 6 packs abs are a good indication of a strong musculature core and are the basis of a stable and highly athletic body. It’s the same parallel for companies because in today’s competitive business environment, you need strength, stability, and agility to compete. And since IT systems increasingly ARE the business, if your company isn’t performing as strong, lean, and mean as possible, then you can be sure your competitors will be looking to implement every advantage they can.
You may also be thinking why would you need something like Event Processing when you already have good Business Intelligence systems in place? The reality is that it’s not easy to monitor and measure useful but sometimes hidden data /event / sensor / social media sources and also to discern which patterns have meaning and which patterns may be discovered as false negatives. But the real difference is that BI usually reports to you after the fact when the value of acting on the situation has diminished significantly.
So while muscles are important to be able to stand up and run, and good quality, strong muscles are necessary to do heavy lifting, it’s those 6 pack abs on top of it all that give you the mean lean fighting machine to identify significant threats and opportunities amongst your data, and in essence, to better compete and win.