Category Archives: Data Integration Platform
When it comes to cloud-based data analytics, a recent study by Ventana Research (as found in Loraine Lawson’s recent blog post) provides a few interesting data points. The study reveals that 40 percent of respondents cited lowered costs as a top benefit, improved efficiency was a close second at 39 percent, and better communication and knowledge sharing also ranked highly at 34 percent.
Ventana Research also found that organizations cite a unique and more complex reason to avoid cloud analytics and BI. Legacy integration work can be a major hindrance, particularly when BI tools are already integrated with other applications. In other words, it’s the same old story:
The ability to deal with existing legacy systems when moving to concepts such as big data or cloud-based analytics is critical to the success of any enterprise data analytics strategy. However, most enterprises don’t focus on data integration as much as they should, and hope that they can solve the problems using ad-hoc approaches.
You can’t make sense of data that you can’t see.
These approaches rarely work as well a they should, if at all. Thus, any investment made in data analytics technology is often diminished because the BI tools or applications that leverage analytics can’t see all of the relevant data. As a result, only part of the story is told by the available data, and those who leverage data analytics don’t rely on the information, and that means failure.
What’s frustrating to me about this issue is that the problem is easily solved. Those in the enterprise charged with standing up data analytics should put a plan in place to integrate new and legacy systems. As part of that plan, there should be a common understanding around business concepts/entities of a customer, sale, inventory, etc., and all of the data related to these concepts/entities must be visible to the data analytics engines and tools. This requires a data integration strategy, and technology.
As enterprises embark on a new day of more advanced and valuable data analytics technology, largely built upon the cloud and big data, the data integration strategy should be systemic. This means mapping a path for the data from the source legacy systems, to the views that the data analytics systems should include. What’s more, this data should be in real operational time because data analytics loses value as the data becomes older and out-of-date. We operate a in a real-time world now.
So, the work ahead requires planning to occur at both the conceptual and physical levels to define how data analytics will work for your enterprise. This includes what you need to see, when you need to see it, and then mapping a path for the data back to the business-critical and, typically, legacy systems. Data integration should be first and foremost when planning the strategy, technology, and deployments.
Amazon Redshift, one of the fast-rising stars in the AWS ecosystem has taken the data warehousing world by storm ever since it was introduced almost two years ago. Amazon Redshift operates completely in the cloud, and allows you to provision nodes on-demand. This model allows you to overcome many of the pains associated with traditional data warehousing techniques, such as provisioning extra server hardware, sizing and preparing databases for loading or extensive SQL scripting.
However, when loading data into Redshift, you may find it challenging to do so in a timely manner. To reduce the time taken to load this data, you may have to spend a tremendous amount of time writing SQL optimization queries which takes away the value proposition of using Redshift in the first place.
Informatica Cloud helps you load this data quickly into Redshift in just a few minutes. To start using Informatica Cloud, you’ll need to establish connections from Redshift and your other data source first. Here are a few easy steps to help you get started with establishing connections from a relational database such as MySQL as well as Redshift into Informatica Cloud:
- Login into your Informatica Cloud account, go to Configure -> Connections, click “New”, and select “MySQL” for “Type”
- Select your Secure Agent and fill in the rest of the database details:
- Test your connection and then click ‘OK’ to save and exit
- Now, login to your AWS account and go to Redshift service page
- Go to your cluster configuration page and make a note of the cluster and cluster database properties: Number of Nodes, Endpoint, Port, Database Name, JDBC URL. You also will need:
- The Redshift database user name and password (which is different from your AWS account)
- AWS account Access Key
- AWS account Secret Key
- Exit the AWS console.
- Now, back in your Informatica Cloud account, go to Configure -> Connections and click “New”.
- Select “AWS Redshift (Informatica)” for “Type” and fill in the rest of the details from the information you have from above
- Test the connection and then click ‘OK’ to save and exit
As you can see, establishing connections was extremely easy and can be done in less than 5 minutes. To learn how customers such as UBM used Informatica Cloud to deliver next-generation customer insights with Amazon Redshift, please join us on September 16 for a webinar where we’ll have product experts from Amazon and UBM explaining how your company can benefit from cloud data warehousing for petabyte-scale analytics using Amazon Redshift.
I’ve “sold” data integration as a concept for the last 20 years. Let me tell you, it’s challenging to define the benefits to those who don’t work with this technology every day. That said, most of the complaints I hear about enterprise IT are around the lack of data integration, and thus the inefficiencies that go along with that lack, such as re-keying data, data quality issues, lack of automation across systems, and so forth.
Considering that most of you will sell data integration to your peers and leadership, I’ve come up with 3 proven ways to sell data integration internally.
First, focus on the business problems. Use real world examples from your own business. It’s not tough to find any number of cases where the data was just not there to make core operational decisions that could have avoided some huge mistakes that proved costly to the company. Or, more likely, there are things like ineffective inventory management that has no way to understand when orders need to be place. Or, there’s the go-to standard: No single definition of what a “customer” or a “sale” is amongst the systems that support the business. That one is like back pain, everyone has it at some point.
Second, define the business case in practical terms with examples. Once you define the business problems that exist due to lack of a sound data integration strategy and technologies, it’s time to put money behind those numbers. Those in IT have a tendency to either way overstate, or way understate the amount of money that’s being wasted and thus could be saved by using data integration approaches and technology. So, provide practical numbers that you can back-up with existing data.
Finally, focus on a phased approach to implementing your data integration solution. The “Big Bang Theory” is a great way to define the beginning of the universe, but it’s not the way you want to define the rollout of your data integration technology. Define a workable plan that moves from one small grouping of systems and databases to another, over time, and with a reasonable amount of resources and technology. You do this to remove risk from the effort, as well as manage costs, and insure that you can dial lessons learned back into the efforts. I would rather roll out data integration within an enterprises using small teams and more problem domains, than attempt to do everything within a few years.
The reality is that data integration is no longer optional for enterprises these days. It’s required for so many reasons, from data sharing, information visibility, compliance, security, automation…the list goes on and on. IT needs to take point on this effort. Selling data integration internally is the first and most important step. Go get ‘em.
Adrian gathered experts and built workgroups to dig into the issue and do root cause analysis. The workgroups came back with some pretty surprising results.
- Most people expected that “incorrect data” (missing, out of date, incomplete, or wrong data) would be the main problem. What they found was that this was only #5 on the list of issues.
- The #1 issue was “Too much data.” People working with the data could not find the data they needed because there was too much data available, and it was hard to figure out which was the data they needed.
- The #2 issue was that people did not know the meaning of data. And because people had different interpretations of the data, the often produced analyses with conflicting results. For example, “claims paid date” might mean the date the claim was approved, the date the check was cut or the date the check cleared. These different interpretations resulted in significantly different numbers.
- In third place was the difficulty in accessing the data. Their environment was a forest of interfaces, access methods and security policies. Some were documented and some not.
In one of the workgroups, a senior manager put the problem in a larger business context;
“Not being able to leverage the data correctly allows competitors to break ground in new areas before we do. Our data in my opinion is the ‘MOST’ important element for our organization.”
What started as a relatively straightforward data quality project became a more comprehensive enterprise data management initiative that could literally change the entire organization. By the project’s end, Adrian found himself leading the data strategy of the organization.
This kind of story is happening with increasing frequency across all industries as all businesses become more digital, the quantity and complexity of data grows, and the opportunities to offer differentiated services based on data grow. We are entering an era of data-fueled organizations where the competitive advantage will go to those who use their data ecosystem better than their competitors.
Gartner is predicting that we are entering an era of increased technology disruption. Organizations that focus on data as their competitive edge will have the advantage. It has become clear that a strong enterprise data architecture is central to the strategy of any industry-leading organization.
For more future-thinking on the subject of enterprise data management and data architecure see Think ‘Data First” to Drive Business Value
My first job out of college was to figure out how to get devices that monitored and controlled an advanced cooling and heating system to communicate with a centralized and automated control center. We ended up building custom PCs for the application, running a version of Unix (DOS would not cut it), and the PCs mounted in industrial cases would communicate with the temperature and humidity sensors, as well as turn on and turn off fans and dampers.
At then end of the day, this was a data integration, not an engineering problem, that we were attempting to solve. The devices had to talk to the PCs, and the PC had to talk to a centralized system (Mainframe) that was able to receive the data, as well as use that data to determine what actions to take. For instance, the ability determine that 78 degrees was too warm for a clean room, and that a damper had to be open and a fan turned on to reduce the temperature, and then turn off when the temperature returned to normal.
Back in the day, we had to create and deploy custom drivers and software. These days, most devices have well-defined interfaces, or APIs, that developers and data integration tools can access to gather information from that device. We also have high performing networks. Much like any source or target system, these devices produce data which is typically bound to a structure, and that data can be consumed and restructured to meet the needs of the target system.
For instance, data coming off a smart thermostat in your home may be in the following structure:
Device (char 10)
Date (char 8)
Temp (num 3)
You’re able to access this device using an API (typically a REST-based Web Service), which returns a single chunk of data which is bound to the structure, such as:
Then you can transform the structure into something that’s native to the target system that receives this data, as well as translate the data (e.g., converting the Data form characters to numbers). This is where data integration technology makes money for you, given its ability to deal with the complexity of translating and transforming the information that comes off the device, so it can be placed in a system or data store that’s able to monitor, analyze, and react to this data.
This is really what the IOT is all about; the ability to have devices spin out data that is leveraged to make better use of the devices. The possibilities are endless, as to what can be done with that data, and how we can better manage these devices. Data integration is key. Trust me, it’s much easier to integrate with devices these days than it was back in the day.
Thank you for reading about Data Integration with Devices! Editor’s note: For more information on Data Integration, consider downloading “Data Integration for Dummies“
The conversation at the Gartner Enterprise Architecture Summit was very interesting last week. They central them for years had been idea of closely linking enterprise architecture with the goals and strategy. This year, Gartner added another layer to that conversation. They are now actively promoting the idea of enterprise architects as strategists.
The reason why is simple. The next wave of change is coming and it will significantly disrupt everybody. Even worse, your new competitors may be coming from other industries.
Enterprise architects are in a position to take a leading role within the strategy process. This is because they are the people who best understand both business strategy and technology trends.
Some of the key ideas discussed included:
- The boundaries between physical and digital products will blur
- Every organization will need a technology strategy to survive
- Gartner predicts that by 2017: 60% of the Global 1,000 will execute on at least one revolutionary and currently unimaginable business transformation effort.
- The change is being driven by trends such as mobile, social, the connectedness of everything, cloud/hybrid, software-defined everything, smart machines, and 3D printing.
I agree with all of this. My view is that this means that it is time for enterprise architects to think very differently about architecture. Enterprise applications will come and go. They are rapidly being commoditized in any case. They need to think like strategists; in terms of market differentiation. And nothing will differentiate an organization more than their data. Example: Google autonomous cars. Google is jumping across industry boundaries to compete in a new market with data as their primary differentiator. There will be many others.
Years of thinking of architecture from an application-first or business process-first perspective have left us with silos of data and the classic ‘spaghetti diagram” of data architecture. This is slowing down business initiative delivery precisely at the time organizations need to accelerate and make data their strategic weapon. It is time to think data-first when it comes to enterprise architecture.
You will be seeing more from Informatica on this subject over the coming weeks and months.
Take a minute to comment on this article. Your thoughts on how we should go about changing to a data-first perspective, both pro and con are welcomed.
Also, remember that Informatica is running a contest to design the data architecture of the year 2020. Full details are here.
Question: What do American Airlines, Liberty Mutual, Discount Tire and MD Anderson all have in common?
a) They are all top in their field.
b) They all view data as critical to their business success.
c) They are all using Agile Data Integration to drive business agility.
d) They have spoken about their Data Integration strategy at Informatica World in Vegas.
Did you reply all of the above? If so then give yourself a Ding Ding Ding. Or shall we say Ka-Ching in honor of our host city?
Indeed Data experts from these companies and many more flocked to Las Vegas for Informatica World. They shared their enthusiasm for the important role of data in their business. These industry leaders discussed best practices that facilitate an Agile Data Integration process.
American Airlines recently completed a merger with US Airways, making them the largest airline in the world. In order to service critical reporting requirements for the merged airlines, the enterprise data team undertook a huge Data Integration task. This effort involved large-scale data migration and included many legacy data sources. The project required transferring over 4TB of current history data for Day 1 reporting. There is still a major task of integrating multiple combined subject areas in order to give a full picture of combined reporting.
American Airlines architects recommend the use of Data Integration design patterns in order to improve agility. The architects shared success-factors for merger Data Integration. They discussed the importance of ownership by leadership from IT and business. They emphasized the benefit of open and honest communications between teams. They architects also highlighted the need to identify integration teams and priorities. Finally the architects discussed the significance of understanding cultural differences and celebrating success. The team summarized with merger Data Integration lessons learned : Metadata is key, IT and business collaboration is critical, and profiling and access to the data is helpful.
Liberty Mutual, the third largest property and casualty insurer in the US, has grown through acquisitions. The Data Integration team needs to support this business process. They have been busy integrating five claim systems into one. They are faced with a large-scale Data Integration challenge. To add to the complexity, their business requires that each phase is completed in one weekend, no data is lost in the process and that all finances balance out at the end of each merge. Integrating all claims in a single location was critical for smooth processing of insurance claims. A single system also leads to reduced costs and complexity for support and maintenance.
Liberty Mutual experts recommend a methodology of work preparation, profiling, delivery and validation. Rinse and repeat. Additionally, the company chose to utilize a visual Data Integration tool. This tool was quick and easy for the team to learn and greatly enhanced development agility.
Discount Tire, the largest independent tire dealer in the USA, shared tips and tricks from migrating legacy data into a new SAP system. This complex project included data conversion from 50 legacy systems. The company needs to combine and aggregate data from many systems, including customer, sales, financial and supply chain. This integrated system helps Discount Tire make key business decisions and remain competitive in a highly competitive space.
Discount Tire has automated their data validation process in development and in production. This reduces testing time, minimizes data defects and increases agility of development and operations. They have also implemented proactive monitoring in order to accomplish early detection and correction of data problems in production.
MD Anderson Cancer Center is the No. 1 hospital for cancer care in the US according to U.S. News and World Report. They are pursuing the lofty goal of erasing cancer from existence. Data Integration is playing an important role in this fight against cancer. In order to accomplish their goal, MD Anderson researchers rely on integration of vast amounts of genomic, clinical and pharmaceutical data to facilitate leading-edge cancer research.
MD Anderson experts pursue Agile Data Integration through close collaboration between IT and business stakeholders. This enables them to meet the data requirements of the business faster and better. They shared that data insights, through metadata management, offer a significant value to the organization. Finally the experts at MD Anderson believe in ‘Map Once, Deploy Anywhere’ in order to accomplish Agile Data Integration.
So let’s recap, Data Integration is helping:
- An airlines continue to serve its customers and run its business smoothly post-merger.
- A tire retail company to procure and provide tires to its customers and maintain leadership
- An insurance company to process claims accurately and in a timely manner, while minimizing costs, and
- A cancer research center to cure cancer.
Not too shabby, right? Data Integration is clearly essential to business success!
So OK, I know, I know… what happens in Vegas, stays in Vegas. Still, this was one love-fest I was compelled to share! Wish you were there. Hopefully you will next year!
To learn more about Agile Data Integration, check out this webinar: Great Data by Design II: How to Get Started with Next-Gen Data Integration
We can all imagine self-driving cars that distinguish between a life-threatening situation (like a swerving car ahead) or a thing-threatening occurrence (like a scurrying raccoon) and brake and steer accordingly. And we expect automated picking systems will soon know — by a SKU’s size, shape, weight and temperature — which assembly line or packing area gets which products. And it won’t be long before enterprise systems will see and plug security holes across hundreds of systems, no matter whether the data is hosted internally or held by partners and suppliers.
The underpinning for such smarts is data that’s clean, safe and connected — the hallmarks of everything we do and believe in at Informatica. But we also recognize that next-generation products need something more. They also need to know when and where data changes, along with how to get the right data to the right person, place or thing, in the right way. That’s why Informatica is unveiling our vision for an Intelligent Data Platform, fueled by new technology innovations in data intelligence.
Data intelligence is built on two new capabilities – live data map and inference engine. Live data map continuously updates all the metadata—structural, semantic, usage and otherwise— on all of the data flowing through an enterprise, while the inference engine can deduce user intentions, help humans search for what they need in their own natural language, and provide recommendations on the best way to consume data depending on the use case. The combination ensures that clean, safe and connected data gets to whomever or whatever needs it, as it’s needed—fast.
We at Informatica believe these capabilities are so incredibly vital for the enterprise that the Intelligent Data Platform now serves as the foundation of many of our future products — beginning with Project Springbok and Project Secure@Source™. These two new offerings simplify some of the toughest challenges facing people in the enterprise: letting business users find and use the data they need, and seeing where their most-sensitive data is hiding amidst all the nooks and crannies.
Project Springbok’s Excel-like interface lets everyday business folks and mere mortals find the data sets they’re interested in, fix formatting and quality issues, and do tasks that are a pain today to perform — such as combining data sets or publishing the results for colleagues to reuse and enhance. Project Springbok is also a guide, with its recommendations derived by the inference engine. It tells users the sources they could or should have access to, and then provisions only what they should have. It lets users see which data sets colleagues are most frequently accessing and finding the most valuable. It also alerts users to inconsistent or incomplete data, suggests ways to sort new combinations of data sets and recommends the best data for the task.
While we designed Project Springbok for the average business user, Project Secure@Source is intended for people responsible for protecting the enterprise, including chief risk officers, chief information security officers (CISOs) and even board members of public companies. That’s because Project Secure@Source’s graphical interface displays all the systems holding sensitive data, such as social security numbers, medical records or payment card information.
But it’s not enough just to know where that data is. To safeguard all the sensitive information about their products, their customers, and their employees, users also need to understand how that data got into these systems, how it moves around, and who is using it. Project Secure@Source does that, too — showing, for example, that an engineer used payment card data to test a Hadoop cluster, and left it there. With Project Secure@Source, users can selectively remove or mask that data from any system in the enterprise.
You’ll hear us talk about and showcase the Intelligent Data Platform, Project Springbok and Project Secure@Source at Informatica World on May 13 and 14. I hope you’ll join us to learn how our vision and our product roadmap will enable a smarter world for all of us, today.
What would the ideal data architecture of the year 2020 look like?
Informatica want’s to know how YOU would answer that question. For this reason, we’ve created the Informatica Architect’s Challenge, a chance for YOU to share how you would approach enterprise data architecture differently. Send us your proposal and you could win 100 iPad Minis for the school of your choice.
There are a lot of challenges to think about here, but let’s start with these:
- Organizations are requiring dramatically faster delivery of business initiatives and are unhappy with the current performance of IT. Think this is “marketing hyperbole?” See the McKinsey survey.
- Data in most organizations is highly fragmented and scattered across dozens or hundreds of different systems. Simply finding and prepping data is becoming the majority of the work in any IT project.
- The problem is only going to get worse as cloud, 3rd party data, social, mobile, big data, and the Internet of Things dramatically increase the complexity of enterprise data environments.
Data is the one thing that uniquely differentiates your organization from its competitors. The question is: How you are going to architect to deliver the data to fuel your future business success? How will you manage the challenges of increasing complexity while delivering with the speed your organization requires?
It’s a chance make a positive contribution for education, while at the same time gaining some professional visibility for yourself as a thought leader. We can’t wait to see what you’ll create!
For additional details, please visit the Informatica Architect’s Challenge official page.
- Does Data Integration technology truly provide a clear path toward unified data?
- Can businesses truly harness the potential of their information?
- Can companies take powerful action as a result?
Recently, Bloor Research set out to evaluate how things were actually playing out on the ground. In particular, they wanted to determine which data integration projects were actually taking place, at what scale, and with what results. The study, “Comparative Costs and Uses for Data Integration Platforms,” was authored by Philip Howard, research director at Bloor. The study examined data integration tool suitability across a range of scenarios, including:
- Data migration and consolidation projects
- Master data management (MDM) and associated solutions
- Application-to-application integration
- Data warehousing and business intelligence implementations
- Synching data with SaaS applications
- B2B data exchange
To draw conclusions, Bloor examined 292 responses from a range of companies. The responders used a variety of data integration approaches, from commercial data integration tools to “hand-coding.”
Informatica is pleased to be able to offer you a copy of this research for your review. The research covers areas like:
- Total Cost of Ownership (TCO)
We welcome you to download a copy of “Comparative Costs and Uses for Data Integration Platforms” today. We hope these findings offer you insights as you implement and evaluate your data integration projects and options.