Leo Eweani makes the case that the data tsunami is coming. “Businesses are scrambling to respond and spending accordingly. Demand for data analysts is up by 92%; 25% of IT budgets are spent on the data integration projects required to access the value locked up in this data “ore” – it certainly seems that enterprise is doing The Right Thing – but is it?”
Data is exploding within most enterprises. However, most enterprises have no clue how to manage this data effectively. While you would think that an investment in data integration would be an area of focus, many enterprises don’t have a great track record in making data integration work. “Scratch the surface, and it emerges that 83% of IT staff expect there to be no ROI at all on data integration projects and that they are notorious for being late, over-budget and incredibly risky.”
The core message from me is that enterprises need to ‘up their game’ when it comes to data integration. This recommendation is based upon the amount of data growth we’ve already experienced, and will experience in the near future. Indeed, a “data tsunami” is on the horizon, and most enterprises are ill prepared for it.
So, how do you get prepared? While many would say it’s all about buying anything and everything, when it comes to big data technology, the best approach is to splurge on planning. This means defining exactly what data assets are in place now, and will be in place in the future, and how they should or will be leveraged.
To face the forthcoming wave of data, certain planning aspects and questions about data integration rise to the top:
Performance, including data latency. Or, how quickly does the data need to flow from point or points A to point or points B? As the volume of data quickly rises, the data integration engines have got to keep up.
Data security and governance. Or, how will the data be protected both at-rest and in-flight, and how will the data be managed in terms of controls on use and change?
Abstraction, and removing data complexity. Or, how will the enterprise remap and re-purpose key enterprise data that may not currently exist in a well-defined and functional structure?
Integration with cloud-based data. Or, how will the enterprise link existing enterprise data assets with those that exist on remote cloud platforms?
While this may seem like a complex and risky process, think through the problems, leverage the right technology, and you can remove the risk and complexity. The enterprises that seem to fail at data integration do not follow that advice.
I suspect the explosion of data to be the biggest challenge enterprise IT will face in many years. While a few will take advantage of their data, most will struggle, at least initially. Which route will you take?
As covered by Loraine Lawson, “When it comes to data, the U.S. federal government is a bit of a glutton. Federal agencies manage on average 209 million records, or approximately 8.4 billion records for the entire federal government, according to Steve O’Keeffe, founder of the government IT network site, MeriTalk.”
Check out these stats, in a December 2013 MeriTalk survey of 100 federal records and information management professionals. Among the findings:
- Only 18 percent said their agency had made significant progress toward managing records and email in electronic format, and are ready to report.
- One in five federal records management professionals say they are “completely prepared” to handle the growing volume of government records.
- 92 percent say their agency “has a lot of work to do to meet the direction.”
- 46 percent say they do not believe or are unsure about whether the deadlines are realistic and obtainable.
- Three out of four say the Presidential Directive on Managing Government Records will enable “modern, high-quality records and information management.”
I’ve been working with the US government for years, and I can tell that these facts are pretty accurate. Indeed, the paper glut is killing productivity. Even the way they manage digital data needs a great deal of improvement.
The problem is that the issues are so massive that’s it’s difficult to get your arms around it. Just the DOD alone has hundreds of thousands of databases on-line, and most of them need to exchange data with other systems. Typically this is done using old fashion approaches, including “sneaker-net,” Federal Express, FTP, and creaky batching extracts and updates.
The “digital data diet,” as Loraine calls it, really needs to start with a core understanding of most of the data under management. That task alone will take years, but, at the same time, form an effective data integration strategy that considers the dozens of data integration strategies you likely formed in the past that did not work.
The path to better data management in the government is one where you have to map out a clear path from here to there. Moreover, you need to make sure you define some successes along the way. For example, the simple reduction of manual and paper processes by 5 or 10 percent would be a great start. It’s something that would save the tax payers billions in a short period of time.
Too many times the government gets too ambitious around data integration, and attempts to do too much in too short an amount of time. Repeat this pattern and you’ll find yourself running in quicksand, and really set yourself up for failure.
Data integration is game-changing technology. Indeed, the larger you are, the more game-changing it is. You can’t get much larger than the US government. Time to get to work.
Marketing is changing how we leverage data. In the past, we had rudimentary use of data to understand how marketing campaigns affect demand. Today, we focus on the customer. The shift is causing those in marketing to get good at data, and good at data integration. These data points are beginning to appear, as are the clear and well-defined links between data integration and marketing.
There is no better data point than Yesmail Interactive’s recent survey of 100 senior-level marketers at companies with online and offline sales models, and $10 million to more than $1 billion in revenues. My good friend, Loraine Lawson, outlined this report in a recent blog.
The resulting report, “Customer Lifecycle Engagement: Imperatives for mid-to-large companies,” (link requires sign up) shows many midsize and large B2C “marketers lack the data and technology they need for more effective segmentation.”
The report lists a few proof points:
- 86 percent of marketers say they could generate more revenue from customers if they had access to a more complete picture of customer attributes.
- 34 percent cited both poor data quality and fragmented systems as among the most significant barriers to personalized customer communications.
- On a similar note, only 46 percent were satisfied with data quality.
- 48 percent were satisfied with their web analytics integration.
- 47 percent were satisfied with their customer data integration.
- 41 percent of marketers incorporate web browsing and online behavior data in targeting criteria—although one-third said they plan to leverage this source in the future.
- Only 20 percent augment in-house customer data with third-party data at the customer level.
- Only 24 percent augment customer data at an aggregate level (such as the industry or region). Compare that to 58 percent who say they either purchase or plan to purchase third-party data to augment customer records, primarily to “validate data integrity.”
Considering this data, it’s pretty easy to draw the conclusions that those in marketing don’t have access to the customer data required to effectively do their jobs. Thus, those in enterprise IT who support marketing should take steps to leverage the right data integration processes and technologies to provide them access to the necessary analytical data.
The report includes a list of key recommendations, all of which center around four key strategic imperatives:
- Marketing data must shift from stagnant data silos to real-time data access.
- Marketing data must shift from campaign-centric to customer-centric.
- Marketing data must shift from non-integrated multichannel to integrated multichannel. Marketing must connect analytics, strategy and the creative.
If case you have not noticed, in order to carry out these recommendations, you need a sound focus on data integration, as well as higher-end analytical systems, which will typically leverage big data-types of technologies. For those in marketing, the effective use of customer and other data is key to understanding their marketplace, which is key to focusing marketing efforts and creating demand. The links with marketing and data integration are stronger than ever.
Interesting that I found this one. Informatica announced that two Informatica customers were named Leaders in the Ventana Research 2013 Leadership Awards, which honor the leaders and pioneers who have contributed to their organizations’ successes. While many of you may think that I’m shelling for our host, these stories are actually hard to come by, and when I find them I love to make hay.
This is not a lack of interest; it’s just the fact that those successful with data integration projects are typically the unsung heroes of enterprise IT. There are almost never awards. However, those who count on enterprise IT to provide optimal data flow in support of the business processes should understand that the data integration got them there. In this case, one of the more interesting stories was around UMASS Memorial Health Care: Leadership Award For CIO George Brenckle
“The Ventana Research Leadership Awards recognize organizations and supporting technology vendors that have effectively achieved superior results through using people, processes, information and technology while applying best practices within specific business and technology categories.” Those receiving these awards leverage the Informatica Platform, thus why the award is promoted. However, I find the approaches and the way technology is leveraged most interesting.
Just as a bit of background. UMASS Memorial Health Care undertook the Cornerstone initiative to transform the way data is used across its medical center, four community hospitals, more than 60 outpatient clinics, and the University of Massachusetts Medical School. The geographical distribution of these entities, and the different ways that they store data is always the challenge.
When approaching these problems you need two things: First, a well defined plan as to how you plan on approaching the problem, including the consumption of information from the source, the processing of that information, and the production of that information to the target. Cornerstone implements common patient, clinical and financial systems and drive information across these systems to optimize healthcare delivery and improve patient outcomes, grow the patient population and increase efficiency.
UMASS Memorial Health Care used Informatica to establish a data integration and data quality initiative to incorporate data from its clinical, financial and administrative sources and targets. Using the Informatica technology, they are able to place volatility into the domain of the integration technology, in this case, Informatica. This allows the integration administrator to add or delete systems as needed, and brings the concept of agility to a rapidly growing hospital system.
“The success of Cornerstone has resulted in primary patient panel analytics, online diabetes care, a command center for the ICU, and compliance with Medicare programs.” Indeed, the business results are apparent around the use of data integration approaches and technology, including the ability to trace this project back to an immediate and definable business benefit.
A study by Bloor Research put the failure rate for data migration projects at 38%. When you consider that a failed data migration project can temporarily hold up vital business processes, this becomes even more bad news. This affects customer service, internal business processes, productivity, etc., leading to an IT infrastructure that is just not meeting the expectations of the business.
If you own one of these dysfunctional IT infrastructures, you’re not alone. Most enterprises struggle with the ability to manage the use of data within the business. Data integration becomes an ad hoc concept that is solved when needed using whatever works at the time. Moreover, the ability to manage migration and data quality becomes a lost art, and many users distrust the information coming from business systems they should rely upon.
The solution to this problem is complex. There needs to be a systemic approach to data integration that is led by key stakeholders. Several business objectives should be set prior to creating a strategy, approach, and purchasing key technologies. This includes:
- Define the cost of risk in having substandard data quality.
- Define the cost of risk in not having data available to systems and humans in the business.
- Define the cost of lost strategic opportunities, such as moving into a new product line or acquiring a company.
The idea is that, by leveraging data integration approaches and technology, we’ll reduce much of the risk, which actually has a cost.
The risk of data quality is obvious to those inside and out of IT, but the damage that could occur when not having a good data integration and data quality strategy and supporting technology is perhaps much farther reaching that many think. The trick is to solve both problems at the same time, leveraging data integration technology that can deal with data quality issues as well.
Not having data available to both end users who need to see it to operate the business, as well as to machines that need to respond to changing data, adds to the risk and thus the cost. In many enterprises, there is a culture of what I call “data starvation.” This means it’s just accepted that you can’t track orders with accurate data, you can’t pull up current customer sales information, and this is just the way things are. This is really an easy fix these days, and one dollar invested in creating a strategy or purchasing and implementing technology will come back to the business twenty fold, at least.
Finally, define the cost of lost strategic opportunities. This is a risk that many companies pay for, but it’s complex and difficult to define. This means that the inability to get the systems communicating and sharing data around a merger, for example, means that the enterprises can’t easily take advantage of market opportunities.
I don’t know how many times I’ve heard of enterprises failing at their attempts to merge two businesses because IT could not figure out how to the make the systems work and play well together. As with the other two risks, a manageable investment of time and money will remove this risk and thus the cost of the risk.
We discussed Big Data and Big Data integration last month, but the rise of Big Data and the systemic use of data integration approaches and technology continues to be a source of confusion. As with any evolution of technology, assumptions are being made that could get many enterprises into a great deal of trouble as they move to Big Data.
Case in point: The rise of big data gave many people the impression that data integration is not needed when implementing big data technology. The notion is, if we consolidate all of the data into a single cluster of servers, than the integration is systemic to the solution. Not the case.
As you may recall, we made many of the same mistakes around the rise of service oriented architecture (SOA). Don’t let history repeat itself with the rise of cloud computing. Data integration, if anything, becomes more important as new technology is layered within the enterprise.
Hadoop’s storage approach leverages a distributed file system that maps data wherever it sits in a cluster. This means that massive amounts of data reside in these clusters, and you can map and remap the data to any number of structures. Moreover, you’re able to work with both structured and unstructured data.
As covered in a recent Read Write article, the movement to Big Data does indeed come with built-in business value. “Hadoop, then, allows companies to store data much more cheaply. How much more cheaply? In 2012, Rainstor estimated that running a 75-node, 300TB Hadoop cluster would cost $1.05 million over three years. In 2008, Oracle sold a database with a little over half the storage (168TB) for $2.33 million – and that’s not including operating costs. Throw in the salary of an Oracle admin at around $95,000 per year, and you’re talking an operational cost of $2.62 million over three years – 2.5 times the cost, for just over half of the storage capacity.”
Thus, if these data points are indeed correct, Hadoop clearly enables companies to hold all of their data on a single cluster of servers. Moreover, this data really has no fixed structure. “Fixed assumptions don’t need to be made in advance. All data becomes equal and equally available, so business scenarios can be run with raw data at any time as needed, without limitation or assumption.”
While this process may look like data integration to some, the heavy lifting around supplying these clusters with data is always a data integration solution, leveraging the right enabling technology. Indeed, consider what’s required around the movement to Big Data systems additional stress and you’ll realize why strain is placed upon the data integration solution. A Big Data strategy that leverages Big Data technology increases, not decreases, the need for a solid data integration strategy and a sound data integration technology solution.
Big Data is a killer application that most enterprises should at least consider. The business strategic benefits are crystal clear, and the movement around finally being able to see and analyze all of your business data in real time is underway for most of the Global 2000 and the government. However, you won’t achieve these objectives without a sound approach to data integration, and a solid plan to leverage the right data integration technology.
As reported here, “Every 30 to 40 percent increase in data volume usually forces an organization to re-look at infrastructure,” commented Venkat Lakshminarasimha, Global Big Data Integration Specialist, Informatica. He was addressing a gathering of information management professionals from the public sector in a workshop conducted by Informatica on maximizing return on data, as part of the activities surrounding the FutureGov Singapore Forum 2013.”
800 exabytes of potentially useful data were collected in the US in 2009, and 35 zettabytes are expected by 2020. “From a velocity perspective, some organizations have 50GB of real-time data streaming in per second at peak times — this means that you need to look at scalability of infrastructure, and Big Data solutions,” said Venkat.
The fact of the matter is that the rise of big data is only going to make the massive growth of data even more massive. At the same time, we need to figure out ways to get the data from point A (or points A), to point B (or points B). Moreover, do so in a manner that’s both scalable and resilient.
The core issue is that most enterprises are not at all ready for this kind of growth in data. While many point to lack of scalable storage, the reality is that the amount of data required to move between the data stores will quickly saturate the current approaches to data integration, as well as the enabling technology.
Considering what’s been stated above, what is an enterprise supposed to do to prepare for what’s sure to be called the data avalanche of 2014? It starts with an approach, and the right technology.
The real challenge is to create the right approach to data integration, looking at the changing requirements around the use of big data. This includes the ability to deal with both structured and unstructured data, the ability to integrate data leveraging distributed processing, and, most importantly, the ability to scale to an ever-increasing load.
The approach is requirements-driven. Those charged with managing data and data integration should have a complete understanding of where the growth in data will exist. Thus, using this as a jumping-off-point, align these requirements with a data storage and data integration architecture.
However, that’s only part of the story. You need to select a data integration solution that can provide the core integration services, such as transformation, translation, interface mediation, security, governance, etc.. The toughest part is to select and deploy technology that can provide the required scalability. This means providing data integration at speeds that all core business processes are able to access all of the information they need to see, when they need to see it, and at an increasing volume.
The truth of the matter is that few out there understand what’s coming. While data is expected to grow, I don’t think we understand how much. Moreover, we don’t understand how critical data integration is to the strategy.
According to Doug Henschen, Executive Editor at InformationWeek, “Despite the weak economy and zero growth in many IT salary categories, business intelligence (BI), analytics, information-integration and data warehousing professionals are seeing a slow-but-steady rise in income.” (more…)
The consultancy KPMG surveyed nearly 700 executives located around the world, half of whom are already involved with cloud initiatives. Those who dared try their first cloud computing project found out that it’s harder and more expensive than they anticipated.
33 percent of all executives complained about the higher costs for three areas: Implementation, transition and integration. For 31 percent, integrating cloud services with their on-premise applications and systems turned out to be more complex than expected. (more…)
In a recent Sand Hill article, Jeff Kaplan, the managing director of THINKstrategies, reports on the recent and changing state of data integration with the addition of cloud computing. “One of the ongoing challenges that continues to frustrate businesses of all sizes is data integration, and that issue has only become more complicated with the advent of the cloud. And, in the brave new world of the cloud, data integration must morph into a broader set of data management capabilities to satisfy the escalating needs of today’s business.” (more…)