Category Archives: Data Integration Platform
The post is by Philip Howard, Research Director, Bloor Research.
One of the standard metrics used to support buying decisions for enterprise software is total cost of ownership. Typically, the other major metric is functionality. However functionality is ephemeral. Not only does it evolve with every new release but while particular features may be relevant to today’s project there is no guarantee that those same features will be applicable to tomorrow’s needs. A broader metric than functionality is capability: how suitable is this product for a range of different project scenarios and will it support both simple and complex environments?
Earlier this year Bloor Research published some research into the data integration market, which exactly investigated these issues: how often were tools reused, how many targets and sources were involved, for what sort of projects were products deemed suitable? And then we compared these with total cost of ownership figures that we also captured in our survey. I will be discussing the results of our research live with Kristin Kokie, who is the interim CIO of Informatica, on Guy Fawkes’ day (November 5th). I don’t promise anything explosive but it should be interesting and I hope you can join us. The discussions will be vendor neutral (mostly: I expect that Kristin has a degree of bias).
To Register for the Webinar, click Here.
Key findings from the report include:
- 65% of organizations cite data processing and integration as hampering distribution capability, with nearly half claiming their existing software and ERP is not suitable for distribution.
- Nearly two-thirds of enterprises have some form of distribution process, involving products or services.
- More than 80% of organizations have at least some problem with product or service distribution.
- More than 50% of CIOs in organizations with distribution processes believe better distribution would increase revenue and optimize business processes, with a further 38% citing reduced operating costs.
The core findings: “With better data integration comes better automation and decision making.”
This report is one of many I’ve seen over the years that come to the same conclusion. Most of those involved with the operations of the business don’t have access to key data points they need, thus they can’t automate tactical decisions, and also cannot “mine” the data, in terms of understanding the true state of the business.
The more businesses deal with building and moving products, the more data integration becomes an imperative value. As stated in this survey, as well as others, the large majority cite “data processing and integration as hampering distribution capabilities.”
Of course, these issues goes well beyond Australia. Most enterprises I’ve dealt with have some gap between the need to share key business data to support business processes, and decision support, and what current exists in terms of data integration capabilities.
The focus here is on the multiple values that data integration can bring. This includes:
- The ability to track everything as it moves from manufacturing, to inventory, to distribution, and beyond. You to bind these to core business processes, such as automatic reordering of parts to make more products, to fill inventory.
- The ability to see into the past, and to see into the future. The emerging approaches to predictive analytics allow businesses to finally see into the future. Also, to see what went truly right and truly wrong in the past.
While data integration technology has been around for decades, most businesses that both manufacture and distribute products have not taken full advantage of this technology. The reasons range from perceptions around affordability, to the skills required to maintain the data integration flow. However, the truth is that you really can’t afford to ignore data integration technology any longer. It’s time to create and deploy a data integration strategy, using the right technology.
This survey is just an instance of a pattern. Data integration was considered optional in the past. With today’s emerging notions around the strategic use of data, clearly, it’s no longer an option.
Did you know Harrods introduces more than 1.7 million new products every year? This includes their own labels, as well as other brands. Recently, Peter Rush, the Harrods Solution Architect responsible for product information, spoke at Informatica’s MDM Day EMEA in London. At the event, he said there are:
“so many things we want to do: Product Information is at the heart of most of them.”
As part of the customer experience program, Harrods identified product information quality as a key asset, next to customer information management.
The Product Information Challenge Harrods was facing included the following:
- A Lack of a single Product data store
- Inappropriate Product Data objectives
- Massive scale and volume of products and brands (1.7 million new products per year)
- Concessions and Own Bought
- Localized enrichment
- Media Assets all over the estate
While discussing his product information management project, Peter gave a great and simple example. He showed the product descriptions below and asked, “Who knows which two products these are?”:
- XX 6621/74 BLK VN SS TOP 969B S
- XX37066 L/BLU PRK FLAN SH 440B MED
Then, he solved the mystery. The answer was this:
- Black V-neck sleeveless top
- Light blue parker print flannel shirt
Turning vision into reality needs a joint business and IT project
Peter said, it is important to build a “flexible team to meet needs of each project stage, with representation from key business areas”. The team should include representatives from groups like: Merchandise Data, Buying Team, Web Team, IT, CRM and the Shopfloor Team. In addition to their Core Project Team, Harrods defined a Steering Committee and a group of selected Super Users.
Benefit summary: a combination of people, technology and process
At the end of the session, I was impressed by this graphic. This image sums up the essentials of product information management success. It is about the people, who are able to do the right things. It is about how technology enables automation. It is about the process which turns information into value.
Finally it is important to mention our partner Javelin Group is leading the PIM implementation at Harrods. Also Andy Hayler, analyst from The Information Difference, wrote an article for the CIO Magazine.
In 2012, Forbes published an article predicting an upcoming problem.
The Need for Scalable Enterprise Analytics
Specifically, increased exploration in Big Data opportunities would place pressure on the typical corporate infrastructure. The generic hardware used to run most tech industry enterprise applications was not designed to handle real-time data processing. As a result, the explosion of mobile usages, and the proliferation of social networks, was increasing the strain on the system. Most companies now faced real-time processing requirements beyond what the traditional model was designed to handle.
In the past two years, the volume of data and speed of data growth has grown significantly. As a result, the problem has become more severe. It is now clear that these challenges can’t be overcome by simply doubling or tripling their IT spending on infrastructure sprawl. Today, enterprises seek consolidated solutions that offer scalability, performance and ease of administration. The present need is for scalable enterprise analytics.
A Clear Solution Is Available
Informatica PowerCenter and Data Quality is the market leading data integration and data quality platform. This platform has now been certified by Oracle as an optimal solution for both the Oracle Exadata Database Machine and the Oracle SuperCluster.
As the high-speed on-ramp for data into Oracle Exadata, PowerCenter and Data Quality deliver up-to five times faster performance on data load, query, profiling and cleansing tasks. Informatica’s data integration customers can now easily reuse data integration code, skills and resources to access and transform any data from any data source and load it into Exadata, with the highest throughput and scalability.
Customers adopting Oracle Exadata for high-volume, high-speed analytics can now be confident with Informatica PowerCenter and Data Quality. With these products, they can ingest, cleanse and transform all types of data into Exadata with the highest performance and scale required to maximize the value of their Exadata investment.
Proving the Value of Scalable Enterprise Analytics
In order to demonstrate the efficacy of their partnership, the two companies worked together on a Proof Of Value (POV) project. The goal is to prove that using PowerCenter with Exadata would improve both performance and scalability. The project involved PowerCenter and Data Quality 9.6.1 and x4-2 Exadata Machine. Oracle 11g was considered for both standard Oracle and Exadata versions.
The first test conducted a 1TB load test to Exadata and standard Oracle in a typical PowerCenter use case. The second test consisted of querying 1TB profiling warehouse database in Data Quality use case scenario. Performance data was collected for both tests. The scalability factor was also captured. A variant of the TPCH dataset was used to generate the test data. The results were significantly higher than prior Exabyte 1TB test. In particular:
- The data query tests achieved 5x performance.
- The data load tests achieved a 3x-5x speed increase.
- Linear scalability was achieved with read/write tests on Exadata.
What Business Benefits Could You Expect?
Informatica PowerCenter and Data Quality, along-with Oracle Exadata, now provide the best-of-breed combination of software and hardware, optimized to deliver the highest possible total system performance. These comprehensive tools drive agile reporting and analytics, while empowering IT organizations to meet SLAs and quality goals like never before.
- Extend Oracle Exadata’s access to even more business critical data sources. Utilize optimized out-of-the-box Informatica connectivity to easily access hundreds of data sources, including all the major databases, on-premise and cloud applications, mainframe, social data and Hadoop.
- Get more data, more quickly into Oracle Exadata. Move higher volumes of trusted data quickly into Exadata to support timely reporting with up-to-date information (i.e. up to 5x performance improvement compared to Oracle database).
- Centralize management and improve insight into large scale data warehouses. Deliver the necessary insights to stakeholders with intuitive data lineage and a collaborative business glossary. Contribute to high quality business analytics, in a timely manner across the enterprise.
- Instantly re-direct workloads and resources to Oracle Exadata without compromising performance. Leverage existing code and programming skills to execute high-performance data integration directly on Exadata by performing push down optimization.
- Roll-out data integration projects faster and more cost-effectively. Customers can now leverage thousands of Informatica certified developers to execute existing data integration and quality transformations directly on Oracle Exadata, without any additional coding.
- Efficiently scale-up and scale-out. Customers can now maximize performance and lower the costs of data integration and quality operations of any scale by performing Informatica workload and push down optimization on Oracle Exadata.
- Save significant costs involved in administration and expansion. Customers can now easily and economically manage large-scale analytics data warehousing environments with a single point of administration and control, and consolidate a multitude of servers on one rack.
- Reduce risk. Customers can now leverage Informatica’s data integration and quality platform to overcome the typical performance and scalability limitations seen in databases and data storage systems. This will help reduce quality-of-service risks as data volumes rise.
Oracle Exadata is a well-engineered system that offers customers out-of-box scalability and performance on demand. Informatica PowerCenter and Data Quality are optimized to run on Exadata, offering customers business benefits that speed up data integration and data quality tasks like never before. Informatica’s certified, optimized, and purpose-built solutions for Oracle can help you enable more timely and trustworthy reporting. You can now benefit from Informatica’s optimized solutions for Oracle Exadata to make better business decisions by unlocking the full potential of the most current and complete enterprise data available. As shown in our test results, you can attain up to 5x performance by scaling Exadata. Informatica Data Quality customers can perform profiling 1TB datasets, which is unheard before. We urge you to deploy the combined solution to solve your data integration and quality problems today while achieving high speed business analytics in these days of big data exploration and Internet Of Things.
Listen to what Ash Kulkarni, SVP, at OOW14 has to say on how @InformaticaCORP PowerCenter and Data Quality certified by Oracle as optimized for Exadata can deliver up-to five times faster performance improvement on data load, query, profiling, cleansing and mastering tasks, for Exadata.
When it comes to cloud-based data analytics, a recent study by Ventana Research (as found in Loraine Lawson’s recent blog post) provides a few interesting data points. The study reveals that 40 percent of respondents cited lowered costs as a top benefit, improved efficiency was a close second at 39 percent, and better communication and knowledge sharing also ranked highly at 34 percent.
Ventana Research also found that organizations cite a unique and more complex reason to avoid cloud analytics and BI. Legacy integration work can be a major hindrance, particularly when BI tools are already integrated with other applications. In other words, it’s the same old story:
The ability to deal with existing legacy systems when moving to concepts such as big data or cloud-based analytics is critical to the success of any enterprise data analytics strategy. However, most enterprises don’t focus on data integration as much as they should, and hope that they can solve the problems using ad-hoc approaches.
You can’t make sense of data that you can’t see.
These approaches rarely work as well a they should, if at all. Thus, any investment made in data analytics technology is often diminished because the BI tools or applications that leverage analytics can’t see all of the relevant data. As a result, only part of the story is told by the available data, and those who leverage data analytics don’t rely on the information, and that means failure.
What’s frustrating to me about this issue is that the problem is easily solved. Those in the enterprise charged with standing up data analytics should put a plan in place to integrate new and legacy systems. As part of that plan, there should be a common understanding around business concepts/entities of a customer, sale, inventory, etc., and all of the data related to these concepts/entities must be visible to the data analytics engines and tools. This requires a data integration strategy, and technology.
As enterprises embark on a new day of more advanced and valuable data analytics technology, largely built upon the cloud and big data, the data integration strategy should be systemic. This means mapping a path for the data from the source legacy systems, to the views that the data analytics systems should include. What’s more, this data should be in real operational time because data analytics loses value as the data becomes older and out-of-date. We operate a in a real-time world now.
So, the work ahead requires planning to occur at both the conceptual and physical levels to define how data analytics will work for your enterprise. This includes what you need to see, when you need to see it, and then mapping a path for the data back to the business-critical and, typically, legacy systems. Data integration should be first and foremost when planning the strategy, technology, and deployments.
Amazon Redshift, one of the fast-rising stars in the AWS ecosystem has taken the data warehousing world by storm ever since it was introduced almost two years ago. Amazon Redshift operates completely in the cloud, and allows you to provision nodes on-demand. This model allows you to overcome many of the pains associated with traditional data warehousing techniques, such as provisioning extra server hardware, sizing and preparing databases for loading or extensive SQL scripting.
However, when loading data into Redshift, you may find it challenging to do so in a timely manner. To reduce the time taken to load this data, you may have to spend a tremendous amount of time writing SQL optimization queries which takes away the value proposition of using Redshift in the first place.
Informatica Cloud helps you load this data quickly into Redshift in just a few minutes. To start using Informatica Cloud, you’ll need to establish connections from Redshift and your other data source first. Here are a few easy steps to help you get started with establishing connections from a relational database such as MySQL as well as Redshift into Informatica Cloud:
- Login into your Informatica Cloud account, go to Configure -> Connections, click “New”, and select “MySQL” for “Type”
- Select your Secure Agent and fill in the rest of the database details:
- Test your connection and then click ‘OK’ to save and exit
- Now, login to your AWS account and go to Redshift service page
- Go to your cluster configuration page and make a note of the cluster and cluster database properties: Number of Nodes, Endpoint, Port, Database Name, JDBC URL. You also will need:
- The Redshift database user name and password (which is different from your AWS account)
- AWS account Access Key
- AWS account Secret Key
- Exit the AWS console.
- Now, back in your Informatica Cloud account, go to Configure -> Connections and click “New”.
- Select “AWS Redshift (Informatica)” for “Type” and fill in the rest of the details from the information you have from above
- Test the connection and then click ‘OK’ to save and exit
As you can see, establishing connections was extremely easy and can be done in less than 5 minutes. To learn how customers such as UBM used Informatica Cloud to deliver next-generation customer insights with Amazon Redshift, please join us on September 16 for a webinar where we’ll have product experts from Amazon and UBM explaining how your company can benefit from cloud data warehousing for petabyte-scale analytics using Amazon Redshift.
I’ve “sold” data integration as a concept for the last 20 years. Let me tell you, it’s challenging to define the benefits to those who don’t work with this technology every day. That said, most of the complaints I hear about enterprise IT are around the lack of data integration, and thus the inefficiencies that go along with that lack, such as re-keying data, data quality issues, lack of automation across systems, and so forth.
Considering that most of you will sell data integration to your peers and leadership, I’ve come up with 3 proven ways to sell data integration internally.
First, focus on the business problems. Use real world examples from your own business. It’s not tough to find any number of cases where the data was just not there to make core operational decisions that could have avoided some huge mistakes that proved costly to the company. Or, more likely, there are things like ineffective inventory management that has no way to understand when orders need to be place. Or, there’s the go-to standard: No single definition of what a “customer” or a “sale” is amongst the systems that support the business. That one is like back pain, everyone has it at some point.
Second, define the business case in practical terms with examples. Once you define the business problems that exist due to lack of a sound data integration strategy and technologies, it’s time to put money behind those numbers. Those in IT have a tendency to either way overstate, or way understate the amount of money that’s being wasted and thus could be saved by using data integration approaches and technology. So, provide practical numbers that you can back-up with existing data.
Finally, focus on a phased approach to implementing your data integration solution. The “Big Bang Theory” is a great way to define the beginning of the universe, but it’s not the way you want to define the rollout of your data integration technology. Define a workable plan that moves from one small grouping of systems and databases to another, over time, and with a reasonable amount of resources and technology. You do this to remove risk from the effort, as well as manage costs, and insure that you can dial lessons learned back into the efforts. I would rather roll out data integration within an enterprises using small teams and more problem domains, than attempt to do everything within a few years.
The reality is that data integration is no longer optional for enterprises these days. It’s required for so many reasons, from data sharing, information visibility, compliance, security, automation…the list goes on and on. IT needs to take point on this effort. Selling data integration internally is the first and most important step. Go get ‘em.
Adrian gathered experts and built workgroups to dig into the issue and do root cause analysis. The workgroups came back with some pretty surprising results.
- Most people expected that “incorrect data” (missing, out of date, incomplete, or wrong data) would be the main problem. What they found was that this was only #5 on the list of issues.
- The #1 issue was “Too much data.” People working with the data could not find the data they needed because there was too much data available, and it was hard to figure out which was the data they needed.
- The #2 issue was that people did not know the meaning of data. And because people had different interpretations of the data, the often produced analyses with conflicting results. For example, “claims paid date” might mean the date the claim was approved, the date the check was cut or the date the check cleared. These different interpretations resulted in significantly different numbers.
- In third place was the difficulty in accessing the data. Their environment was a forest of interfaces, access methods and security policies. Some were documented and some not.
In one of the workgroups, a senior manager put the problem in a larger business context;
“Not being able to leverage the data correctly allows competitors to break ground in new areas before we do. Our data in my opinion is the ‘MOST’ important element for our organization.”
What started as a relatively straightforward data quality project became a more comprehensive enterprise data management initiative that could literally change the entire organization. By the project’s end, Adrian found himself leading the data strategy of the organization.
This kind of story is happening with increasing frequency across all industries as all businesses become more digital, the quantity and complexity of data grows, and the opportunities to offer differentiated services based on data grow. We are entering an era of data-fueled organizations where the competitive advantage will go to those who use their data ecosystem better than their competitors.
Gartner is predicting that we are entering an era of increased technology disruption. Organizations that focus on data as their competitive edge will have the advantage. It has become clear that a strong enterprise data architecture is central to the strategy of any industry-leading organization.
For more future-thinking on the subject of enterprise data management and data architecure see Think ‘Data First” to Drive Business Value
My first job out of college was to figure out how to get devices that monitored and controlled an advanced cooling and heating system to communicate with a centralized and automated control center. We ended up building custom PCs for the application, running a version of Unix (DOS would not cut it), and the PCs mounted in industrial cases would communicate with the temperature and humidity sensors, as well as turn on and turn off fans and dampers.
At then end of the day, this was a data integration, not an engineering problem, that we were attempting to solve. The devices had to talk to the PCs, and the PC had to talk to a centralized system (Mainframe) that was able to receive the data, as well as use that data to determine what actions to take. For instance, the ability determine that 78 degrees was too warm for a clean room, and that a damper had to be open and a fan turned on to reduce the temperature, and then turn off when the temperature returned to normal.
Back in the day, we had to create and deploy custom drivers and software. These days, most devices have well-defined interfaces, or APIs, that developers and data integration tools can access to gather information from that device. We also have high performing networks. Much like any source or target system, these devices produce data which is typically bound to a structure, and that data can be consumed and restructured to meet the needs of the target system.
For instance, data coming off a smart thermostat in your home may be in the following structure:
Device (char 10)
Date (char 8)
Temp (num 3)
You’re able to access this device using an API (typically a REST-based Web Service), which returns a single chunk of data which is bound to the structure, such as:
Then you can transform the structure into something that’s native to the target system that receives this data, as well as translate the data (e.g., converting the Data form characters to numbers). This is where data integration technology makes money for you, given its ability to deal with the complexity of translating and transforming the information that comes off the device, so it can be placed in a system or data store that’s able to monitor, analyze, and react to this data.
This is really what the IOT is all about; the ability to have devices spin out data that is leveraged to make better use of the devices. The possibilities are endless, as to what can be done with that data, and how we can better manage these devices. Data integration is key. Trust me, it’s much easier to integrate with devices these days than it was back in the day.
Thank you for reading about Data Integration with Devices! Editor’s note: For more information on Data Integration, consider downloading “Data Integration for Dummies“
The conversation at the Gartner Enterprise Architecture Summit was very interesting last week. They central them for years had been idea of closely linking enterprise architecture with the goals and strategy. This year, Gartner added another layer to that conversation. They are now actively promoting the idea of enterprise architects as strategists.
The reason why is simple. The next wave of change is coming and it will significantly disrupt everybody. Even worse, your new competitors may be coming from other industries.
Enterprise architects are in a position to take a leading role within the strategy process. This is because they are the people who best understand both business strategy and technology trends.
Some of the key ideas discussed included:
- The boundaries between physical and digital products will blur
- Every organization will need a technology strategy to survive
- Gartner predicts that by 2017: 60% of the Global 1,000 will execute on at least one revolutionary and currently unimaginable business transformation effort.
- The change is being driven by trends such as mobile, social, the connectedness of everything, cloud/hybrid, software-defined everything, smart machines, and 3D printing.
I agree with all of this. My view is that this means that it is time for enterprise architects to think very differently about architecture. Enterprise applications will come and go. They are rapidly being commoditized in any case. They need to think like strategists; in terms of market differentiation. And nothing will differentiate an organization more than their data. Example: Google autonomous cars. Google is jumping across industry boundaries to compete in a new market with data as their primary differentiator. There will be many others.
Years of thinking of architecture from an application-first or business process-first perspective have left us with silos of data and the classic ‘spaghetti diagram” of data architecture. This is slowing down business initiative delivery precisely at the time organizations need to accelerate and make data their strategic weapon. It is time to think data-first when it comes to enterprise architecture.
You will be seeing more from Informatica on this subject over the coming weeks and months.
Take a minute to comment on this article. Your thoughts on how we should go about changing to a data-first perspective, both pro and con are welcomed.
Also, remember that Informatica is running a contest to design the data architecture of the year 2020. Full details are here.