Category Archives: Data Integration
“Raw materials costs are the company’s single largest expense category,” said Steve Jenkins, Global IT Director at Valspar, at MDM Day in London. “Data management technology can help us improve business process efficiency, manage sourcing risk and reduce RFQ cycle times.”
Valspar is a $4 billion global manufacturing company, which produces a portfolio of leading paint and coating brands. At the end of 2013, the 200 year old company celebrated record sales and earnings. They also completed two acquisitions. Valspar now has 10,000 employees operating in 25 countries.
As is the case for many global companies, growth creates complexity. “Valspar has multiple business units with varying purchasing practices. We source raw materials from 1,000s of vendors around the globe,” shared Steve.
“We want to achieve economies of scale in purchasing to control spending,” Steve said as he shared Valspar’s improvement objectives. “We want to build stronger relationships with our preferred vendors. Also, we want to develop internal process efficiencies to realize additional savings.”
Poorly managed vendor and raw materials data was impacting Valspar’s buying power
The Valspar team, who sharply focuses on productivity, had an “Aha” moment. “We realized our buying power was limited by the age and quality of available vendor data and raw materials data,” revealed Steve.
The core vendor data and raw materials data that should have been the same across multiple systems wasn’t. Data was often missing or wrong. This made it difficult to calculate the total spend on raw materials. It was also hard to calculate the total cost of expedited freight of raw materials. So, employees used a manual, time-consuming and error-prone process to consolidate vendor data and raw materials data for reporting.
These data issues were getting in the way of achieving their improvement objectives. Valspar needed a data management solution.
Valspar needed a single trusted source of vendor and raw materials data
The team chose Informatica MDM, master data management (MDM) technology. It will be their enterprise hub for vendors and raw materials. It will manage this data centrally on an ongoing basis. With Informatica MDM, Valspar will have a single trusted source of vendor and raw materials data.
Informatica PowerCenter will access data from multiple source systems. Informatica Data Quality will profile the data before it goes into the hub. Then, after Informatica MDM does it’s magic, PowerCenter will deliver clean, consistent, connected and enriched data to target systems.
Better vendor and raw materials data management results in cost savings
Valspar expects to gain the following business benefits:
- Streamline the RFQ process to accelerate raw materials cost savings
- Reduce the total number of raw materials SKUs and vendors
- Increase productivity of staff focused on pulling and maintaining data
- Leverage consistent global data visibly to:
- increase leverage during contract negotiations
- improve acquisition due diligence reviews
- facilitate process standardization and reporting
Valspar’s vision is to tranform data and information into a trusted organizational assets
“Mastering vendor and raw materials data is Phase 1 of our vision to transform data and information into trusted organizational assets,” shared Steve. In Phase 2 the Valspar team will master customer data so they have immediate access to the total purchases of key global customers. In Phase 3, Valspar’s team will turn their attention to product or finished goods data.
Steve ended his presentation with some advice. “First, include your business counterparts in the process as early as possible. They need to own and drive the business case as well as the approval process. Also, master only the vendor and raw materials attributes required to realize the business benefit.”
Want more? Download the Total Supplier Information Management eBook. It covers:
- Why your fragmented supplier data is holding you back
- The cost of supplier data chaos
- The warning signs you need to be looking for
- How you can achieve Total Supplier Information Management
In 2012, Forbes published an article predicting an upcoming problem.
The Need for Scalable Enterprise Analytics
Specifically, increased exploration in Big Data opportunities would place pressure on the typical corporate infrastructure. The generic hardware used to run most tech industry enterprise applications was not designed to handle real-time data processing. As a result, the explosion of mobile usages, and the proliferation of social networks, was increasing the strain on the system. Most companies now faced real-time processing requirements beyond what the traditional model was designed to handle.
In the past two years, the volume of data and speed of data growth has grown significantly. As a result, the problem has become more severe. It is now clear that these challenges can’t be overcome by simply doubling or tripling their IT spending on infrastructure sprawl. Today, enterprises seek consolidated solutions that offer scalability, performance and ease of administration. The present need is for scalable enterprise analytics.
A Clear Solution Is Available
Informatica PowerCenter and Data Quality is the market leading data integration and data quality platform. This platform has now been certified by Oracle as an optimal solution for both the Oracle Exadata Database Machine and the Oracle SuperCluster.
As the high-speed on-ramp for data into Oracle Exadata, PowerCenter and Data Quality deliver up-to five times faster performance on data load, query, profiling and cleansing tasks. Informatica’s data integration customers can now easily reuse data integration code, skills and resources to access and transform any data from any data source and load it into Exadata, with the highest throughput and scalability.
Customers adopting Oracle Exadata for high-volume, high-speed analytics can now be confident with Informatica PowerCenter and Data Quality. With these products, they can ingest, cleanse and transform all types of data into Exadata with the highest performance and scale required to maximize the value of their Exadata investment.
Proving the Value of Scalable Enterprise Analytics
In order to demonstrate the efficacy of their partnership, the two companies worked together on a Proof Of Value (POV) project. The goal is to prove that using PowerCenter with Exadata would improve both performance and scalability. The project involved PowerCenter and Data Quality 9.6.1 and x4-2 Exadata Machine. Oracle 11g was considered for both standard Oracle and Exadata versions.
The first test conducted a 1TB load test to Exadata and standard Oracle in a typical PowerCenter use case. The second test consisted of querying 1TB profiling warehouse database in Data Quality use case scenario. Performance data was collected for both tests. The scalability factor was also captured. A variant of the TPCH dataset was used to generate the test data. The results were significantly higher than prior Exabyte 1TB test. In particular:
- The data query tests achieved 5x performance.
- The data load tests achieved a 3x-5x speed increase.
- Linear scalability was achieved with read/write tests on Exadata.
What Business Benefits Could You Expect?
Informatica PowerCenter and Data Quality, along-with Oracle Exadata, now provide the best-of-breed combination of software and hardware, optimized to deliver the highest possible total system performance. These comprehensive tools drive agile reporting and analytics, while empowering IT organizations to meet SLAs and quality goals like never before.
- Extend Oracle Exadata’s access to even more business critical data sources. Utilize optimized out-of-the-box Informatica connectivity to easily access hundreds of data sources, including all the major databases, on-premise and cloud applications, mainframe, social data and Hadoop.
- Get more data, more quickly into Oracle Exadata. Move higher volumes of trusted data quickly into Exadata to support timely reporting with up-to-date information (i.e. up to 5x performance improvement compared to Oracle database).
- Centralize management and improve insight into large scale data warehouses. Deliver the necessary insights to stakeholders with intuitive data lineage and a collaborative business glossary. Contribute to high quality business analytics, in a timely manner across the enterprise.
- Instantly re-direct workloads and resources to Oracle Exadata without compromising performance. Leverage existing code and programming skills to execute high-performance data integration directly on Exadata by performing push down optimization.
- Roll-out data integration projects faster and more cost-effectively. Customers can now leverage thousands of Informatica certified developers to execute existing data integration and quality transformations directly on Oracle Exadata, without any additional coding.
- Efficiently scale-up and scale-out. Customers can now maximize performance and lower the costs of data integration and quality operations of any scale by performing Informatica workload and push down optimization on Oracle Exadata.
- Save significant costs involved in administration and expansion. Customers can now easily and economically manage large-scale analytics data warehousing environments with a single point of administration and control, and consolidate a multitude of servers on one rack.
- Reduce risk. Customers can now leverage Informatica’s data integration and quality platform to overcome the typical performance and scalability limitations seen in databases and data storage systems. This will help reduce quality-of-service risks as data volumes rise.
Oracle Exadata is a well-engineered system that offers customers out-of-box scalability and performance on demand. Informatica PowerCenter and Data Quality are optimized to run on Exadata, offering customers business benefits that speed up data integration and data quality tasks like never before. Informatica’s certified, optimized, and purpose-built solutions for Oracle can help you enable more timely and trustworthy reporting. You can now benefit from Informatica’s optimized solutions for Oracle Exadata to make better business decisions by unlocking the full potential of the most current and complete enterprise data available. As shown in our test results, you can attain up to 5x performance by scaling Exadata. Informatica Data Quality customers can perform profiling 1TB datasets, which is unheard before. We urge you to deploy the combined solution to solve your data integration and quality problems today while achieving high speed business analytics in these days of big data exploration and Internet Of Things.
Listen to what Ash Kulkarni, SVP, at OOW14 has to say on how @InformaticaCORP PowerCenter and Data Quality certified by Oracle as optimized for Exadata can deliver up-to five times faster performance improvement on data load, query, profiling, cleansing and mastering tasks, for Exadata.
Do We Really Need Another Information Framework?
The EIM Consortium is a group of nine companies that formed this year with the mission to:
“Promote the adoption of Enterprise Information Management as a business function by establishing an open industry reference architecture in order to protect and optimize the business value derived from data assets.”
That sounds nice, but we do really need another framework for EIM or Data Governance? Yes we do, and here’s why. (more…)
When it comes to cloud-based data analytics, a recent study by Ventana Research (as found in Loraine Lawson’s recent blog post) provides a few interesting data points. The study reveals that 40 percent of respondents cited lowered costs as a top benefit, improved efficiency was a close second at 39 percent, and better communication and knowledge sharing also ranked highly at 34 percent.
Ventana Research also found that organizations cite a unique and more complex reason to avoid cloud analytics and BI. Legacy integration work can be a major hindrance, particularly when BI tools are already integrated with other applications. In other words, it’s the same old story:
The ability to deal with existing legacy systems when moving to concepts such as big data or cloud-based analytics is critical to the success of any enterprise data analytics strategy. However, most enterprises don’t focus on data integration as much as they should, and hope that they can solve the problems using ad-hoc approaches.
You can’t make sense of data that you can’t see.
These approaches rarely work as well a they should, if at all. Thus, any investment made in data analytics technology is often diminished because the BI tools or applications that leverage analytics can’t see all of the relevant data. As a result, only part of the story is told by the available data, and those who leverage data analytics don’t rely on the information, and that means failure.
What’s frustrating to me about this issue is that the problem is easily solved. Those in the enterprise charged with standing up data analytics should put a plan in place to integrate new and legacy systems. As part of that plan, there should be a common understanding around business concepts/entities of a customer, sale, inventory, etc., and all of the data related to these concepts/entities must be visible to the data analytics engines and tools. This requires a data integration strategy, and technology.
As enterprises embark on a new day of more advanced and valuable data analytics technology, largely built upon the cloud and big data, the data integration strategy should be systemic. This means mapping a path for the data from the source legacy systems, to the views that the data analytics systems should include. What’s more, this data should be in real operational time because data analytics loses value as the data becomes older and out-of-date. We operate a in a real-time world now.
So, the work ahead requires planning to occur at both the conceptual and physical levels to define how data analytics will work for your enterprise. This includes what you need to see, when you need to see it, and then mapping a path for the data back to the business-critical and, typically, legacy systems. Data integration should be first and foremost when planning the strategy, technology, and deployments.
Amazon Redshift, one of the fast-rising stars in the AWS ecosystem has taken the data warehousing world by storm ever since it was introduced almost two years ago. Amazon Redshift operates completely in the cloud, and allows you to provision nodes on-demand. This model allows you to overcome many of the pains associated with traditional data warehousing techniques, such as provisioning extra server hardware, sizing and preparing databases for loading or extensive SQL scripting.
However, when loading data into Redshift, you may find it challenging to do so in a timely manner. To reduce the time taken to load this data, you may have to spend a tremendous amount of time writing SQL optimization queries which takes away the value proposition of using Redshift in the first place.
Informatica Cloud helps you load this data quickly into Redshift in just a few minutes. To start using Informatica Cloud, you’ll need to establish connections from Redshift and your other data source first. Here are a few easy steps to help you get started with establishing connections from a relational database such as MySQL as well as Redshift into Informatica Cloud:
- Login into your Informatica Cloud account, go to Configure -> Connections, click “New”, and select “MySQL” for “Type”
- Select your Secure Agent and fill in the rest of the database details:
- Test your connection and then click ‘OK’ to save and exit
- Now, login to your AWS account and go to Redshift service page
- Go to your cluster configuration page and make a note of the cluster and cluster database properties: Number of Nodes, Endpoint, Port, Database Name, JDBC URL. You also will need:
- The Redshift database user name and password (which is different from your AWS account)
- AWS account Access Key
- AWS account Secret Key
- Exit the AWS console.
- Now, back in your Informatica Cloud account, go to Configure -> Connections and click “New”.
- Select “AWS Redshift (Informatica)” for “Type” and fill in the rest of the details from the information you have from above
- Test the connection and then click ‘OK’ to save and exit
As you can see, establishing connections was extremely easy and can be done in less than 5 minutes. To learn how customers such as UBM used Informatica Cloud to deliver next-generation customer insights with Amazon Redshift, please join us on September 16 for a webinar where we’ll have product experts from Amazon and UBM explaining how your company can benefit from cloud data warehousing for petabyte-scale analytics using Amazon Redshift.
Come and get it. For developers hungry to get their hands on Informatica on Hadoop, a downloadable free trial of Informatica Big Data Edition was launched today on the Informatica Marketplace. See for yourself the power of the killer app on Hadoop from the leader in data integration and quality.
Thanks to the generous help of our partners, the Informatica Big Data team has preinstalled the Big Data Edition inside the sandbox VMs of the two leading Hadoop distributions. This empowers Hadoop and Informatica developers to easily try the codeless, GUI driven Big Data Edition to build and execute ETL and data integration pipelines natively on Hadoop for Big Data analytics.
Informatica Big Data Edition is the most complete and powerful suite for Hadoop data pipelines and can increase productivity up to 5 times. Developers can leverage hundreds of out-of-the-box Informatica pre-built transforms and connectors for structured and unstructured data processing on Hadoop. With the Informatica Vibe Virtual Data Machine running directly on each node of the Hadoop cluster, the Big Data Edition can profile, parse, transform and cleanse data at any scale to prepare data for data science, business intelligence and operational analytics.
The Informatica Big Data Edition Trial Sandbox VMs will have a 60 day trial version of the Big Data Edition preinstalled inside a 1-node Hadoop cluster. The trials include sample data and mappings as well as getting started documentation and videos. It is possible to try your own data with the trials, but processing is limited to the 1-node Hadoop cluster and the machine you have it running on. Any mappings you develop in the trial can be easily moved on to a production Hadoop cluster running the Big Data Edition. The Informatica Big Data Edition also supports MapR and Pivotal Hadoop distributions, however, the trial is currently only available for Cloudera and Hortonworks.
Accelerate your ability to bring Hadoop from the sandbox into production by leveraging Informatica’s Big Data Edition. Informatica’s visual development approach means that more than one hundred thousand existing Informatica developers are now Hadoop developers without having to learn Hadoop or new hand coding techniques and languages. Informatica can help organizations easily integrate Hadoop into their enterprise data infrastructure and bring the PowerCenter data pipeline mappings running on traditional servers onto Hadoop clusters with minimal modification. Informatica Big Data Edition reduces the risk of Hadoop projects and increases agility by enabling more of your organization to interact with the data in your Hadoop cluster.
To get the Informatica Big Data Edition Trial Sandbox VMs and more information please visit Informatica Marketplace
The way I see it, the biggest impact of the Apple Watch will come from how it will finally make data fashionable. For starters, the three Apple Watch models and interchangeable bands will actually make it hip to wear a watch again. But I think the ramifications of this genuinely good-looking watch go well beyond the skin deep. The Cupertino company has engineered its watch and its mobile software to recognize related data and seamlessly share it across relevant apps. And those capabilities allow it to, for instance, monitor our fitness and health, show us where we parked the car, open the door to our hotel room and control our entertainment centers.
Think what this could mean for any company with a Data-First point of view. I like to say that a data-first POV changes everything. With it, companies can unleash the killer app, killer marketing campaign and killer sales organization.The Apple Watch finally gives people a reason to have that killer app with them at all times, wherever they are and whatever they’re doing. Looked at a different way, it could unleash a new culture of Data-Only consumers: People who rely on being told what they need to know, in the right context.
But while Apple may the first to push this Data-First POV in unexpected ways, history has shown they won’t be the last. It’s time for every company to tap into the newest fashion accessory, and make data their first priority.
This got me thinking: What is the biggest bottleneck in the delivery of business value today? I know I look at things from a data perspective, but data is the biggest bottleneck. Consider this prediction from Gartner:
“Gartner predicts organizations will spend one-third more on app integration in 2016 than they did in 2013. What’s more, by 2018, more than half the cost of implementing new large systems will be spent on integration. “
When we talk about application integration, we’re talking about moving data, synchronizing data, cleansing, data, transforming data, testing data. The question for architects and senior management is this: Do you have the Data Foundation for Execution you need to drive the business results you require to compete? The answer, unfortunately, for most companies is; No.
All too often data management is an add-on to larger application-based projects. The result is unconnected and non-interoperable islands of data across the organization. That simply is not going to work in the coming competitive environment. Here are a couple of quick examples:
- Many companies are looking to compete on their use of analytics. That requires collecting, managing, and analyzing data from multiple internal and external sources.
- Many companies are focusing on a better customer experience to drive their business. This again requires data from many internal sources, plus social, mobile and location-based data to be effective.
When I talk to architects about the business risks of not having a shared data architecture, and common tools and practices for enterprise data management, they “get” the problem. So why aren’t they addressing it? The issue is that they find that they are only funded to do the project they are working on and are dealing with very demanding timeframe requirements. They have no funding or mandate to solve the larger enterprise data management problem, which is getting more complex and brittle with each new un-connected project or initiative that is added to the pile.
Studies such as “The Data Directive” by The Economist show that organizations that actively manage their data are more successful. But, if that is the desired future state, how do you get there?
Changing an organization to look at data as the fuel that drives strategy takes hard work and leadership. It also takes a strong enterprise data architecture vision and strategy. For fresh thinking on the subject of building a data foundation for execution, see “Think Data-First to Drive Business Value” from Informatica.
* By the way, Informatica is proud to announce that we are now a sponsor of the MIT Center for Information Systems Research.
Last time I talked about how benchmark data can be used in IT and business use cases to illustrate the financial value of data management technologies. This time, let’s look at additional use cases, and at how to philosophically interpret the findings.
So here are some additional areas of investigation for justifying a data quality based data management initiative:
- Compliance or any audits data and report preparation and rebuttal (FTE cost as above)
- Excess insurance premiums on incorrect asset or party information
- Excess tax payments due to incorrect asset configuration or location
- Excess travel or idle time between jobs due to incorrect location information
- Excess equipment downtime (not revenue generating) or MTTR due to incorrect asset profile or misaligned reference data not triggering timely repairs
- Equipment location or ownership data incorrect splitting service cost or revenues incorrectly
- Party relationship data not tied together creating duplicate contacts or less relevant offers and lower response rates
- Lower than industry average cross-sell conversion ratio due to inability to match and link departmental customer records and underlying transactions and expose them to all POS channels
- Lower than industry average customer retention rate due to lack of full client transactional profile across channels or product lines to improve service experience or apply discounts
- Low annual supplier discounts due to incorrect or missing alternate product data or aggregated channel purchase data
I could go on forever, but allow me to touch on a sensitive topic – fines. Fines, or performance penalties by private or government entities, only make sense to bake into your analysis if they happen repeatedly in fairly predictable intervals and are “relatively” small per incidence. They should be treated like M&A activity. Nobody will buy into cost savings in the gazillions if a transaction only happens once every ten years. That’s like building a business case for a lottery win or a life insurance payout with a sample size of a family. Sure, if it happens you just made the case but will it happen…soon?
Use benchmarks and ranges wisely but don’t over-think the exercise either. It will become paralysis by analysis. If you want to make it super-scientific, hire an expensive consulting firm for a 3 month $250,000 to $500,000 engagement and have every staffer spend a few days with them away from their day job to make you feel 10% better about the numbers. Was that worth half a million dollars just in 3rd party cost? You be the judge.
In the end, you are trying to find out and position if a technology will fix a $50,000, $5 million or $50 million problem. You are also trying to gauge where key areas of improvement are in terms of value and correlate the associated cost (higher value normally equals higher cost due to higher complexity) and risk. After all, who wants to stand before a budget committee, prophesy massive savings in one area and then fail because it would have been smarter to start with something simpler and quicker win to build upon?
The secret sauce to avoiding this consulting expense and risk is a natural curiosity, willingness to do the legwork of finding industry benchmark data, knowing what goes into them (process versus data improvement capabilities) to avoid inappropriate extrapolation and using sensitivity analysis to hedge your bets. Moreover, trust an (internal?) expert to indicate wider implications and trade-offs. Most importantly, you have to be a communicator willing to talk to many folks on the business side and have criminal interrogation qualities, not unlike in your run-of-the-mill crime show. Some folks just don’t want to talk, often because they have ulterior motives (protecting their legacy investment or process) or hiding skeletons in the closet (recent bad performance). In this case, find more amenable people to quiz or pry the information out of these tough nuts, if you can.
Lastly; if you find ROI numbers, which appear astronomical at first, remember that leverage is a key factor. If a technical capability touches one application (credit risk scoring engine), one process (quotation), one type of transaction (talent management self-service), a limited set of people (procurement), the ROI will be lower than a technology touching multiple of each of the aforementioned. If your business model drives thousands of high-value (thousands of dollars) transactions versus ten twenty-million dollar ones or twenty-million one-dollar ones, your ROI will be higher. After all, consider this; retail e-mail marketing campaigns average an ROI of 578% (softwareprojects.com) and this with really bad data. Imagine what improved data can do just on that front.
I found massive differences between what improved asset data can deliver in a petrochemical or utility company versus product data in a fashion retailer or customer (loyalty) data in a hospitality chain. The assertion of cum hoc ergo propter hoc is a key assumption how technology delivers financial value. As long as the business folks agree or can fence in the relationship, you are on the right path.
What’s your best and worst job to justify someone giving you money to invest? Share that story.