Back in 2004, we saw the rapid growth of SaaS providers such as Salesforce.com. However, there was typically no consistent data integration strategy to go along with the use of SaaS. In many instances, SaaS-delivered applications became the new data silos in the enterprise, silos that lacked a sound integration plan and integration technology.
10 years later, we’ve gotten to a point where we have the ability to solve problems using SaaS and data integration problems around the use of SaaS. However, we typically lack the knowledge and understanding of how to effectively use data integration technology within an enterprise to integrate SaaS problem domains.
Lawson looks at both sides of the SaaS integration argument. “Surveys certainly show that integration is less of a concern for SaaS than in the early days, when nearly 88 percent of SaaS companies said integration concerns would slow down adoption and more than 88 percent said it’s an important or extremely important factor in winning new customers.”
Again, while we’ve certainly gotten better at integration, we’re nowhere near being out of the woods. “A Dimensional Research survey of 350 IT executives showed that 67 percent cited data integration problems as a challenge with SaaS business applications. And as with traditional systems, integration can add hidden costs to your project if you ignore it.”
As I’ve stated many times in this blog, integration requires a bit of planning and the use of solid technology. While this does require some extra effort and money, the return on the value of this work is huge.
SaaS integration requires that you take a bit of a different approach than traditional enterprise integration. SaaS systems typically place your data behind well-defined APIs that can be accessed directly or through a data integration technology. While the information can be consumed by anything that can invoke an API, enterprises still have to deal with structure and content differences, and that’s typically best handled using the right data integration technology.
Other things to consider, things that are again often overlooked, is the need for both data governance and data security around your SaaS integration solution. There should be a centralized control mechanism to support the proper management and security of the data, as well as a mechanism to deal with data quality issues that often emerge when consuming data from any cloud computing services.
The reality is that SaaS is here to stay. Even enterprise software players that put off the move to SaaS-delivered systems, are not standing up SaaS offerings. The economics around the use of SaaS are just way to compelling. However, as SaaS-delivered systems become more common place, so will the emergence of new silos. This will not be an issue, if you leverage the right SaaS integration approach and technology. What will your approach be?
It’s true. Data integration is a whole new game, compared to five years ago, or, in some organizations, five minutes ago. The right approaches to data integration continue to evolve around a few principal forces: First, the growth of cloud computing, as pointed out by Stafford. Second, the growing use of big data systems, and the emerging use of data as a strategic asset for the business.
These forces combine to drive us to the understanding that old approaches to data integration won’t provide the value that they once did. As someone who was a CTO of three different data integration companies, I’ve seen these patterns change over the time that I was building technology, and that change has accelerated in the last 7 years.
The core opportunities lie with the enterprise architect, and their ability to drive an understanding of the value of data integration, as well as drive change within their organization. After all, they, or the enterprises CTOs and CIOs (whomever makes decisions about technological approaches), are supposed to drive the organization in the right technical directions that will provide the best support for the business. While most enterprise architects follow the latest hype, such as cloud computing and big data, many have missed the underlying data integration strategies and technologies that will support these changes.
“The integration challenges of cloud adoption alone give architects and developers a once in a lifetime opportunity to retool their skillsets for a long-term, successful career, according to both analysts. With the right skills, they’ll be valued leaders as businesses transition from traditional application architectures, deployment methodologies and sourcing arrangements.”
The problem is that, while most agree that data integration is important, they typically don’t understand what it is, and the value it can bring. These days, many developers live in a world of instant updates. With emerging DevOps approaches and infrastructure, they really don’t get the need, or the mechanisms, required to share data between application or database silos. In many instances, they resort to coding interfaces between source and target systems. This leads to brittle and unreliable integration solutions, and thus hurts and does not help new cloud application and big data deployments.
The message is clear: Those charged with defining technology strategies within enterprises need to also focus on data integration approaches, methods, patterns, and technologies. Failing to do so means that the investments made in new and emerging technology, such as cloud computing and big data, will fail to provide the anticipated value. At the same time, enterprise architects need to be empowered to make such changes. Most enterprises are behind on this effort. Now it’s time to get to work.
The articles cites some research from Ovum, that predicts many enterprises will begin moving toward data integration, driven largely by the rise of cloud computing and big data. However, enterprises need to invest in both modernizing the existing data management infrastructure, as well as invest in data integration technology. “All of these new investments will push the middleware software market up 9 percent to a $16.3 billion industry, Information Management reports.” This projection is for 2015.
I suspect that’s a bit conservative. In my travels, I see much more interest in data integration strategies, approaches, and technology, as cloud computing continues to grow, as well as enterprises understand better the strategic use of data. So, I would put the growth at 15 percent for 2015.
There are many factors driving this growth, beyond mere interest in cloud computing and big data.
The first consideration is that data is more strategic than initially understood. While businesses have always considered data a huge asset, it has not been until the last few years that businesses have seen the true value of understanding what’s going on inside, and outside of their business.
Manufacturing companies want to see the current state of production, as well as production history. Management can now use that data to predict trends to address, such as future issues around employee productivity, or even a piece of equipment that is likely to fail and the impact of that failure on revenue. Healthcare companies are learning how to better monitor patient health, such as spotting likely health problems before they are diagnosed, or leveraging large data to understand when patterns emerge around health issues, such as areas of the country that are more prone to asthma, based upon air quality.
Second, there is the need to deal with compliance issues. The new health care regulations, or even the new regulation around managing a publically traded company, require a great deal of data management issues, including data integration.
As these laws emerge, and are altered over time, the reporting requirements are always more complex and far reaching than they were before. Those who want to avoid fines, or even avoid stock drops around mistakes, are paying close attention to this area.
Finally, there is an expectation from customers and employees that you will have a good handle on your data. 10 years ago you could tell a customer on the phone that you needed to check different systems to answer their question. Those days are over. Today’s customers and employees want immediate access to the data they need, and there is no good excuse for not being able to produce that data. If you can’t, your competition will.
The interest in data integration will experience solid growth in 2015, around cloud and big data, for sure. However, other factors will drive this growth, and enterprises will finally understand that data integration is core to an IT strategy, and should never be an afterthought.
According to the article, in Hamilton County Ohio, it’s not unusual to see kids from the same neighborhoods coming to the hospital for asthma attacks. Thus, researchers wanted to know if it was fact or mistaken perception that an unusually high number of children in the same neighborhood were experiencing asthma attacks. The next step was to review existing data to determine the extent of the issues, and perhaps how to solve the problem altogether.
“The researchers studied 4,355 children between the ages of 1 and 16 who visited the emergency department or were hospitalized for asthma at Cincinnati Children’s between January 2009 and December 2012. They tracked those kids for 12 months to see if they returned to the ED or were readmitted for asthma.”
Not only were the researchers able to determine a sound correlation between the two data sets, but they were able to advance the research to predict which kids were at high-risk based upon where they live. Thus, some of the cause and the effects have been determined.
This came about when researchers began thinking out of the box, when it comes to dealing with traditional and non-traditional medical data. They integrated housing and census data, in this case, with that of the data from the diagnosis and treatment of the patients. These are data sets unlikely to find their way to each other, but together they have a meaning that is much more valuable than if they just stayed in their respective silos.
“Non-traditional medical data integration has begun to take place in some medical collaborative environments already. The New York-Presbyterian Regional Health Collaborative created a medical village, which ‘goes beyond the established patient-centered medical home mode.’ It not only connects an academic medical center with a large ambulatory network, medical homes, and other providers with each other, but community resources such as school-based clinics and specialty-care centers (the ones that are a part of NYP’s network).”
The fact of the matter is that data is the key to understanding what the heck is going on when cells of sick people begin to emerge. While researchers and doctors can treat the individual patients there is not a good understanding of the larger issues that may be at play. In this case, poor air quality in poor neighborhoods. Thus, they understand what problem needs to be corrected.
The universal sharing of data is really the larger solution here, but one that won’t be approached without a common understanding of the value, and funding. As we pass laws around the administration of health care, as well as how data is to be handled, perhaps it’s time we look at what the data actually means. This requires a massive deployment of data integration technology, and the fundamental push to share data with a central data repository, as well as with health care providers.
California reported a total of 167 data breaches in 2013, which is up 28 percent from the 2012. Two major data breaches caused most of this uptick, including the Target attack that was reported in December 2013, and the LivingSocial attack that occurred in April 2013. This year, you can add the Home Depot data breach to that list, as well as the recent breach at the US Post Office.
So, what the heck is going on? And how does this new impact data integration? Should we be concerned, as we place more and more data on public clouds, or within big data systems?
Almost all of these breaches were made possible by traditional systems with security technology and security operations that fell far enough behind that outside attackers found a way in. You can count on many more of these attacks, as enterprises and governments don’t look at security as what it is; an ongoing activity that may require massive and systemic changes to make sure the data is properly protected.
As enterprises and government agencies stand up cloud-based systems, and new big data systems, either inside (private) or outside (public) of the enterprise, there are some emerging best practices around security that those who deploy data integration should understand. Here are a few that should be on the top of your list:
First, start with Identity and Access Management (IAM) and work your way backward. These days, most cloud and non-cloud systems are complex distributed systems. That means IAM is is clearly the best security model and best practice to follow with the emerging use of cloud computing.
The concept is simple; provide a security approach and technology that enables the right individuals to access the right resources, at the right times, for the right reasons. The concept follows the principle that everything and everyone gets an identity. This includes humans, servers, APIs, applications, data, etc.. Once that verification occurs, it’s just a matter of defining which identities can access other identities, and creating policies that define the limits of that relationship.
Second, work with your data integration provider to identify solutions that work best with their technology. Most data integration solutions address security in one way, shape, or form. Understanding those solutions is important to secure data at rest and in flight.
Finally, splurge on monitoring and governance. Many of the issues around this growing number of breaches exist with the system managers’ inability to spot and stop attacks. Creative approaches to monitoring system and network utilization, as well as data access, will allow those in IT to spot most of the attacks and correct the issues before the ‘go nuclear.’ Typically, there are an increasing number of breach attempts that lead up to the complete breach.
The issue and burden of security won’t go away. Systems will continue to move to public and private clouds, and data will continue to migrate to distributed big data types of environments. And that means the need data integration and data security will continue to explode.
“The NIH multi-institute awards constitute an initial investment of nearly $32 million in fiscal year 2014 by NIH’s Big Data to Knowledge (BD2K) initiative and will support development of new software, tools and training to improve access to these data and the ability to make new discoveries using them, NIH said in its announcement of the funding.”
The grants will address issues around Big Data adoption, including:
- Locating data and the appropriate software tools to access and analyze the information.
- Lack of data standards, or low adoption of standards across the research community.
- Insufficient polices to facilitate data sharing while protecting privacy.
- Unwillingness to collaborate that limits the data’s usefulness in the research community.
Among the tasks funded is the creation of a “Perturbation Data Coordination and Integration Center.” The center will provide support for data science research that focuses on interpreting and integrating data from different data types and databases. In other words, it will make sure the data moves to where it should move, in order to provide access to information that’s needed by the research scientist. Fundamentally, it’s data integration practices and technologies.
This is very interesting from the standpoint that the movement into big data systems often drives the reevaluation, or even new interest in data integration. As the data becomes strategically important, the need to provide core integration services becomes even more important.
The project at the NIH will be interesting to watch, as it progresses. These are the guys who come up with the new paths to us being healthier and living longer. The use of Big Data provides the researchers with the advantage of having a better understanding of patterns of data, including:
- Patterns of symptoms that lead to the diagnosis of specific diseases and ailments. Doctors may get these data points one at a time. When unstructured or structured data exists, researchers can find correlations, and thus provide better guidelines to physicians who see the patients.
- Patterns of cures that are emerging around specific treatments. The ability to determine what treatments are most effective, by looking at the data holistically.
- Patterns of failure. When the outcomes are less than desirable, what seems to be a common issue that can be identified and resolved?
Of course, the uses of big data technology are limitless, when considering the value of knowledge that can be derived from petabytes of data. However, it’s one thing to have the data, and another to have access to it.
Data integration should always be systemic to all big data strategies, and the NIH clearly understands this to be the case. Thus, they have funded data integration along with the expansion of their big data usage.
Most enterprises will follow much the same path in the next 2 to 5 years. Information provides a strategic advantage to businesses. In the case of the NIH, it’s information that can save lives. Can’t get much more important than that.
Key findings from the report include:
- 65% of organizations cite data processing and integration as hampering distribution capability, with nearly half claiming their existing software and ERP is not suitable for distribution.
- Nearly two-thirds of enterprises have some form of distribution process, involving products or services.
- More than 80% of organizations have at least some problem with product or service distribution.
- More than 50% of CIOs in organizations with distribution processes believe better distribution would increase revenue and optimize business processes, with a further 38% citing reduced operating costs.
The core findings: “With better data integration comes better automation and decision making.”
This report is one of many I’ve seen over the years that come to the same conclusion. Most of those involved with the operations of the business don’t have access to key data points they need, thus they can’t automate tactical decisions, and also cannot “mine” the data, in terms of understanding the true state of the business.
The more businesses deal with building and moving products, the more data integration becomes an imperative value. As stated in this survey, as well as others, the large majority cite “data processing and integration as hampering distribution capabilities.”
Of course, these issues goes well beyond Australia. Most enterprises I’ve dealt with have some gap between the need to share key business data to support business processes, and decision support, and what current exists in terms of data integration capabilities.
The focus here is on the multiple values that data integration can bring. This includes:
- The ability to track everything as it moves from manufacturing, to inventory, to distribution, and beyond. You to bind these to core business processes, such as automatic reordering of parts to make more products, to fill inventory.
- The ability to see into the past, and to see into the future. The emerging approaches to predictive analytics allow businesses to finally see into the future. Also, to see what went truly right and truly wrong in the past.
While data integration technology has been around for decades, most businesses that both manufacture and distribute products have not taken full advantage of this technology. The reasons range from perceptions around affordability, to the skills required to maintain the data integration flow. However, the truth is that you really can’t afford to ignore data integration technology any longer. It’s time to create and deploy a data integration strategy, using the right technology.
This survey is just an instance of a pattern. Data integration was considered optional in the past. With today’s emerging notions around the strategic use of data, clearly, it’s no longer an option.
With that said, the basic approaches to consider are from the top-down, or the bottom-up. You can be successful with either approach. However, there are certain efficiencies you’ll gain with a specific choice, and it could significantly reduce the risk and cost. Let’s explore the pros and cons of each approach.
Approaching data integration from the top-down means moving from the high level integration flows, down to the data semantics. Thus, you an approach, perhaps even a tool-set (using requirements), and then define the flows that are decomposed down to the raw data.
The advantages of this approach include:
The ability to spend time defining the higher levels of abstraction without being limited by the underlying integration details. This typically means that those charged with designing the integration flows are more concerned with how they have to deal with the underlying source and target, and this approach means that they don’t have to deal with that issue until later, as they break down the flows.
The disadvantages of this approach include:
The data integration architect does not consider the specific needs of the source or target systems, in many instances, and thus some rework around the higher level flows may have to occur later. That causes inefficiencies, and could add risk and cost to the final design and implementation.
For the most part, this is the approach that most choose for data integration. Indeed, I use this approach about 75 percent of the time. The process is to start from the native data in the sources and targets, and work your way up to the integration flows. This typically means that those charged with designing the integration flows are more concerned with the underlying data semantic mediation than the flows.
The advantages of this approach include:
It’s typically a more natural and traditional way of approaching data integration. Called “data-driven” integration design in many circles, this initially deals with the details, so by the time you get up to the integration flows there are few surprises, and there’s not much rework to be done. It’s a bit less risky and less expensive, in most cases.
The disadvantages of this approach include:
Starting with the details means that you could get so involved in the details that you miss the larger picture, and the end state of your architecture appears to be poorly planned, when all is said and done. Of course, that depends on the types of data integration problems you’re looking to solve.
No matter which approach you leverage, with some planning and some strategic thinking, you’ll be fine. However, there are different paths to the same destination, and some paths are longer and less efficient than others. As you pick an approach, learn as you go, and adjust as needed.
As covered in Loraine Lawson’s blog, MeriTalk surveyed federal government IT professionals about their use of cloud computing. As it turns out, “89 percent out of 153 surveyed expressed ‘some apprehension about losing control of their IT services,’ according to MeriTalk.”
Loraine and I agree that what the survey says about the government’s data integration, management, and governance, is that they don’t seem to be very good at cloud data management…yet. Some of the other gruesome details include:
- 61 percent do not have quality, documented metadata.
- 52 percent do not have well understood data integration processes.
- 50 percent have not identified data owners.
- 49 percent do not have known systems of record.
“Overall, respondents did not express confidence about the success of their data governance and management efforts, with 41 percent saying their data integration management efforts were some degree of ‘not successful.’ This lead MeriTalk to conclude, ‘Data integration and remediation need work.’”
The problem with the government is that data integration, data governance, data management, and even data security have not been priorities. The government has a huge amount of data to manage, and they have not taken the necessary steps to adopt the best practices and technology that would allow them to manage it properly.
Now that everyone is moving to the cloud, the government included, questions are popping up about the proper way to manage data within the government, from the traditional government enterprises to the public cloud. Clearly, there is much work to be done to get the government ready for the cloud, or even ready for emerging best practices around data management and data integration.
If the government is to move in the right direction, they must first come to terms with the data. This means understanding where the data is, what it does, who owns it, access mechanisms, security, governance, etc., and apply this understanding holistically to most of the data under management.
The problem within the government is that the data is so complex, distributed, and, in many cases, unique, that it’s difficult for the government to keep good track of the data. Moreover, the way the government does procurement, typically in silos, leads to a much larger data integration problem. I was working with government agencies that had over 5,000 siloed systems, each with their own database or databases, and most do not leverage data integration technology to exchange data.
There are ad-hoc data integration approaches and some technology in place, but nowhere close to what’s need to support the amount and complexity of data. Now that government agencies are looking to move to the cloud, the issues around data management are beginning to be better understood.
So, what’s the government to do? This is a huge issue that can’t be fixed overnight. There should be incremental changes that occur over the next several years. This also means allocating more resources to data management and data integration than has been allocated in the past, and moving it much higher up in the priorities lists.
These are not insurmountable problems. However, they require a great deal of focus before things will get better. The movement to the cloud seems to be providing that focus.
When it comes to cloud-based data analytics, a recent study by Ventana Research (as found in Loraine Lawson’s recent blog post) provides a few interesting data points. The study reveals that 40 percent of respondents cited lowered costs as a top benefit, improved efficiency was a close second at 39 percent, and better communication and knowledge sharing also ranked highly at 34 percent.
Ventana Research also found that organizations cite a unique and more complex reason to avoid cloud analytics and BI. Legacy integration work can be a major hindrance, particularly when BI tools are already integrated with other applications. In other words, it’s the same old story:
The ability to deal with existing legacy systems when moving to concepts such as big data or cloud-based analytics is critical to the success of any enterprise data analytics strategy. However, most enterprises don’t focus on data integration as much as they should, and hope that they can solve the problems using ad-hoc approaches.
You can’t make sense of data that you can’t see.
These approaches rarely work as well a they should, if at all. Thus, any investment made in data analytics technology is often diminished because the BI tools or applications that leverage analytics can’t see all of the relevant data. As a result, only part of the story is told by the available data, and those who leverage data analytics don’t rely on the information, and that means failure.
What’s frustrating to me about this issue is that the problem is easily solved. Those in the enterprise charged with standing up data analytics should put a plan in place to integrate new and legacy systems. As part of that plan, there should be a common understanding around business concepts/entities of a customer, sale, inventory, etc., and all of the data related to these concepts/entities must be visible to the data analytics engines and tools. This requires a data integration strategy, and technology.
As enterprises embark on a new day of more advanced and valuable data analytics technology, largely built upon the cloud and big data, the data integration strategy should be systemic. This means mapping a path for the data from the source legacy systems, to the views that the data analytics systems should include. What’s more, this data should be in real operational time because data analytics loses value as the data becomes older and out-of-date. We operate a in a real-time world now.
So, the work ahead requires planning to occur at both the conceptual and physical levels to define how data analytics will work for your enterprise. This includes what you need to see, when you need to see it, and then mapping a path for the data back to the business-critical and, typically, legacy systems. Data integration should be first and foremost when planning the strategy, technology, and deployments.