The Embarrassing Reality of Data Integration
As much as hand-coding is decried as a pointless waste of time when it comes to data integration, there are far worse approaches. These approaches are the truly manual data integration that goes on in every department in every company and organization across the globe. This is Excel-based, manual data integration, where people within a business unit pull together data exports—“extract”—from various systems, manually clean and merge the data—“transform”—then send it in an email—“load”—to the “target” user of that data. This is the most rudimentary of all “ETL” (Extract-Transform-Load) approaches. Often the people performing this data integration are not particularly skilled at using Excel, using some self-defined “process” not even a best-practice for Excel, and which they don’t repeat from one time to the next. This is the true nadir of data integration and ETL, and yet it is probably its most common incarnation.
We are All Data Integrators
At the heart of this problem is not inaccessible Enterprise software, over-worked ICCs, or over-extended Enterprise IT groups. At the heart of this problem is a failure of these people to identify the work they do as data integration, and to understand that data integration is a key part of their job, and as such needs to be a core competency they seek to develop. But if data integration is to be a core competency developed by the tens of millions of office workers manually integrating data, there must be an evolution in the tools they use—in price, quality and usability.
ETL Tools Everywhere, but Where’s the One for Me?
To make data integration something that is embraced and practiced by millions, entry-level data integration/ETL tools must be accessibly priced—in-line with the Office suite of products at the most. Open source tools have made a great contribution to pushing pricing in this direction. But open source tools lack the quality and ease of use that will foster this level of adoption.
The current crop of enterprise ETL tools, while high quality and fit for enterprise volumes and complexity bring too much functionality for this use case. The hordes of manual data integrators need sophisticated tools, but they need to be able to easily navigate them, not get bogged down in functionality defined for long-tail use cases.
PowerCenter Express: The Antidote to Excel-Based ETL
PowerCenter Express, Informatica’s free data integration (ETL) and profiling software provides a solution to this problem. It offers Informatica’s mature PowerCenter product, complete with a full transformation library, workflow, and even in-line data profiling. PowerCenter Express though, has streamlined functionality to ensure ease of use. Installation is easy taking less than ten minutes, and requiring no configuration. Contextual cheat sheets, embedded tutorials, and full access to Informatica’s online MySupport portal guarantee even true novices will succeed with data integration.
It will take a while before these people realize that they are acting as “data integration hubs.” When they do, and when they realize they are at the heart of their own complicated data integration hairball, PowerCenter Express will be there to help them to extricate themselves.
Data integration is reaching a new level of maturity, a level that recognizes the exigencies of corporate standards, but understands the realities of departmental needs. As opposed to a unified, centralized approach, this approach, rather, is a federated approach to data integration. Federated data integration, just like a federated approach to government creates a balance between the need for central control and consistency at the highest level, e.g. a national government or enterprise IT group, and the need for autonomy at lower levels of organization, e.g. states or departments within a larger enterprise. Federated data integration differs (substantially) from data federation. Data federation brings together data from many disparate sources into one virtual data base. Federated data integration does the same things for the mappings that define how data is integrated. The link is in the federated approach to management.
The catalyst for this federated approach to data integration is the tension between Enterprise IT groups and departmental or line-of-business IT. Departments and lines of business “go rogue” to solve an immediate need, only to find they ultimately need to tie back into other enterprise systems, or to extract data from them. Enterprise IT, which faces ever-mounting backlogs of enterprise-wide and departmental requests is unable to service all the requests it receives; and at the same time must acknowledge that those requests (most of them at least) do have pressing business value.
Most of both the spoils and the casualties of this war are data-related. One party needs data for a business purpose, another party can’t share it for fear of negatively impacting another business purpose or policy. Marketing needs data to execute a marketing campaign, legal needs data from marketing to ensure compliance with government relations, sales needs to keep data under wraps to ensure a customer is being treated in a consistent manner throughout the sales process. As data is the source of most of the strife, so too, will the resolution of its mutual and complementary use provide the solution—hence, the advent of federated data integration.
To do this, to create this federation of data, the organization needs, like a federation of states, a common currency. Data is the chief “currency” within and beyond an organization. As such, it must be possible to “transact” that currency, to move that currency from one area of an organization to the next. Data integration provides the platform that makes it possible to transact data. It allows an organization to define what data needs to move where and provides the platform for facilitating that movement. Data integration also allows organizations to specify what conversions (“transformations” is the word used in data integration parlance) are necessary for the people on both sides of that “transaction” to understand and consume that data. Essentially, data integration is the true information broker in an organization.
With an un-federated approach to data integration, different parts of an organization do data integration, well, differently. These different approaches to data integration effectively create different data “currencies” across the organizations. While they provide a consistent data “currency” and brokerage of that data currency within a specific part or parts of the organization, that data cannot be easily moved and understood beyond those areas. In order to move from one “data integration state” to the next, say, from the “Informatica state” to the “SQL state”, the person wanting the data must “exchange it,” or more specifically, perform yet another integration from one data integration platform to the next. This kind of friction results in negative business results, because much data that should be shared and communicated is not due to the difficulty of moving it.
One of the biggest reasons for the creation of these different “data integration states” is that different parts of an organization have different data integration platforms. As such, the mappings—the directions that specify where and how data is supposed to moved and transformed—are not portable from one platform to the next, and this is what creates that spawning multitude of data currencies. This is precisely the reason that Enterprise IT is so focused on standards and standardization. The reality though, is that if departments or lines-of-business are unable to get their data integration done by enterprise IT, they will still get it done—they cannot be in a position where they are unable to move this vital currency. Typically, these departments and lines-of-business don’t have the same level of funding that would allow them to use the enterprise class tools favored by Enterprise IT. As a result, departments turn to hand-coding and open source solutions to meet their data integration needs. And each new approach creates a new data “currency” that will need to be “converted” into the organization’s preferred data “currency.”
Two things now make this kind of fragmentation and friction unnecessary. The first is Vibe. Vibe is the technology underlying Informatica’s data integration platform, the platform that makes it possible to do a mapping –specify where and how data needs to be moved and transformed—and deploy it anywhere. The second is PowerCenter Express. PowerCenter Express is a free data integration and profiling product from Informatica—that runs on Vibe.
Because PowerCenter Express is available free of charge, it offers a way for departments and lines-of-business to satisfy their data integration needs, without creating a new data “currency” that will need to be converted back into the standard corporate currency. The reality (not just Informatica’s reality) is that the majority of large enterprises have chosen Informatica as their corporate standard for data integration. With the advent of PowerCenter Express, departments within those organizations can now create mappings that will tie their data into that corporate data currency—thanks to Vibe. With Vibe, a mapping can be deployed anywhere—on PowerCenter Express, on Informatica Cloud, on PowerCenter Big Data Edition—regardless of where that mapping was developed.
The combination of Vibe and PowerCenter Express is what makes possible this new era of federated data integration. It makes it possible for organizations to achieve a balance between central control—the need to create a common data currency that moves freely throughout the organization on the data integration platform—and autonomy—allowing departments to satisfy their own data integration needs, without waiting for Enterprise IT. In fact, this federation is even broader than an individual enterprise. Already, consultants are developing mappings off-site using PowerCenter Express, and deploying them on Enterprise versions of PowerCenter.
PowerCenter Express is now available to the general public on Informatica’s Marketplace. The response from the first group of users—InformaticaWorld attendees who were given early access—has been incredibly positive. In fact, 95% of those users would recommend PowerCenter Express to a friend.
Despite being free, PowerCenter Express is incredibly feature-rich—this is no dumbed-down version of PowerCenter, rather it is PowerCenter tailored to entry-level volumes and workloads. A survey of users shows they plan to push PowerCenter Express up to the limits of that tailoring. Not surprisingly, more than 80% plan on using the database connectors. Somewhat more surprisingly, well more than a third plan to use the social media connectors—showing that this data is no longer the province of the most technically advanced companies, but a business requirement. A full 50% plan to use the in-line data profiling, and 90% of those will be doing sophisticated rule-based profiling. Nearly two-thirds of those surveyed plan on using workflows to orchestrate complex, multi-step transformations. More than a half of users plan to avail themselves of the mid-stream data viewer, a cool technology that lets users see what happens to data during each transformation, before deploying.
PowerCenter Express is aimed at a new user segment, departmental IT organizations and small to medium-sized organizations. Because of this, and the greater ease of access to PowerCenter Express, we knew people would find different uses for it than they have for PowerCenter Enterprise editions. While we had some ideas for how they might choose to use it, it’s been really exciting to hear how they actually are using it. The most common use cases we’ve heard thus far are:
- Small projects moving systems/applications data to databases
- Initial projects that they plan to migrate to PowerCenter Enterprise once critical mass is achieved
- For training purposes—both to keep their own skills sharp, and for others, especially non-ETL developers
- To empower technical business analysts to do the groundwork for basic interfaces
- For proof-of-concepts to pilot cool ideas that may or may not make it into production in the enterprise
We have no doubt this list will grow, and we look forward to hearing from you how you are using it. In the meantime, it’s exciting to see that the PowerCenter Express promise—start now, start small, scale fast—is being realized. It’s also very gratifying to see that these users have so easily and intuitively grasped the promise of Vibe, the promise to map once, and deploy anywhere. It is that power and promise that is enabling them to move the mappings they create in PowerCenter Express directly to PowerCenter Enterpise, or to the Informatica Cloud. While we love that people get the concept of Vibe, what we love most is that people aren’t wasting time re-doing their work as their deployment choices change.
As always, let me know your thoughts.
We all know that everywhere we go, physically or virtually, we leave data contrails. What is interesting is our response to that fact, which is bi-polar at best. When we want people to help us—e.g. when we call our credit card issuer–we fully expect that they should have a completely integrated view of us and all our data. On the other hand, when it comes to the data contrails that we passively generate as we live our lives, we actually have quite a bit of faith in data dis-integration, that no company or organization a) cares enough about that data to integrate it or b) is sophisticated enough to successfully integrate it even if they cared enough to want to.
What is interesting about both of these attitudes is that they are both under-pinned by a sort of intuitive understanding that integrating data is really hard. We expect that our credit card issuer can integrate data because those are typically large, sophisticated companies, with significant resources—we expect substantially less from the online retailer where we buy lederhosen for an Oktoberfest party. We also have an intuitive understanding of the difficulty of pulling together data from different sources, and of different types—particularly unstructured types. It is just this intuitive sense of difficulty that makes us feel OK about how much and how fully our lives are sketched out by our data contrails. Then we hear the leaks from Edward Snowden, and our faith in data dis-integration is undermined.
The reality is that even as the sources and variety of data is increasing, the sophistication of the tools available for integrating those data is increasing apace. More importantly, even as the sophistication is increasing, the difficulty of using those tools is decreasing proportionally—to the point that some are actually genuinely easy to use.
What’s especially exciting about this trend is that now “the little guy”—say that guy selling lederhosen online from his house in L.A.—can integrate data from the multiple different applications he uses to run his business, and use it to improve and grow that business.
Of course there’s another side of that sword, the Edward Snowden side. That side is one whose sharpness we will ascertain with time. The complexities of that side will be teased out as we as individuals specify and insist on what data we want accessed by whom, and under what circumstances—and further what tradeoffs we are willing to make for convenience and relevance. From a business perspective this granular approach to data and data access requires a very sophisticated understanding and application of data and data integration. And because the norms associated with this subject are still being evolved, it requires that organizations take the high road–or end up on the front page as “exhibit a” of how not to treat people’s data. That means trying to envision on the part of the people whose data you are gathering and using, how they would want that data used if it were yours, and making those decisions clear to those same people.
We’re really excited to see the momentum that PowerCenter Express has after its private launch at InformaticaWorld. With this momentum, have come a lot of questions. One of the more common questions I’ve gotten is “When can I share this?” The answer is NOW. If you’ve got the URL, forward it to everyone you know. We want the ETL pros—those of you who were at InformaticaWorld—to use and abuse PowerCenter Express; and we want you to send it to your friends so they can use it and abuse it, too. Yes, you’ve got special early access, but what’s the fun of a secret if you can’t share it? Share, share!
Another question I’ve gotten is whether or not there are any restrictions on posting videos of PowerCenter Express on YouTube. The answer is we absolutely WANT YOU TO POST your videos! In fact, there’s already one up now, posted by DataSource Consulting: http://www.youtube.com/watch?v=5qHflvMqze0 If you’ve installed PowerCenter Express, go ahead and upload your own videos, then tweet them out using the hash tag #PCeX. Or send them to me, and I’ll tweet them out and post them, too. You can get more sneak peeks of PowerCenter Express by checking out the videos on the PowerCenter Express YouTube channel http://www.youtube.com/playlist?list=PLmi6HWWEAjKqIT1HvXyp5SKy_uehVoeQR.
The public launch is coming up fast, and we’ll be launching a lot in the way of communities and other resources to help you get the most out of PowerCenter Express. One of the ways that we think you’ll get the most out of PowerCenter Express is by being able to share and interact with others who are using it—and we want to start that process of sharing as early as possible.
P.S. I have my own little secret; in one of the videos posted I noticed some rather interesting links showing up in the screen shots…
At InformaticaWorld, we made a very exciting announcement—the introduction of PowerCenter Express, our entry-level data integration and profiling tool. What is PowerCenter Express, exactly? Well, in a nutshell, it’s giving the Power of PowerCenter to everyone, “to the people” if you like. We made PowerCenter Express available to all attendees at InformaticaWorld and they’ll be able to install it and be up and running in less than ten minutes. Since it’s PowerCenter, they’ll be able to scale up to enterprise class capabilities whenever they need to, using Vibe, our “Map Once, Deploy Anywhere” technology. Starting in July PowerCenter Express will be generally available to everyone- as a free download from Informatica’s Marketplace.
What we are doing with PowerCenter Express, is making sure that everyone, including departments and growing businesses, have access to PowerCenter’s high quality data integration and profiling tools. Until now the options for these groups have been limited—hand coding or open source products. Neither of these options is able to scale to be able to handle enterprise class data integration requirements. Which meant that before the advent of PowerCenter Express when these smaller organizations reached the point where they needed enterprise class capabilities and had to migrate to an enterprise data integration tool, they had no choice but to scrap all of their prior work . We don’t want that to happen anymore. We don’t want anyone to have to re-write mappings, to re-do work—ever. We want people to be able to map once, and deploy anywhere. And that’s what PowerCenter Express makes possible, that any organization, no matter how small, can start with PowerCenter—the gold standard for data integration—and stay with PowerCenter, re-using those same mappings when they transition to enterprise class, or when they want to deploy those mappings to Hadoop.
The reality is, as organizations’ data integration complexity reaches a certain point, they end up coming to Informatica—for the best products , the best support and the biggest ecosystem of developers. But in the past, for smaller organizations starting with the fully functional PowerCenter wasn’t always the best option. With PowerCenter Express, organizations can start small, start now, and scale fast. PowerCenter Express offers a real choice and future protection for entry-level data integration
If you’d like to learn more about PowerCenter Express before the public launch, shoot me an email at EBurns@Informatica.com. And start following me here, I’ll be posting a lot about this exciting new product over the coming weeks and months.
Emily V. Burns
Sr. Product Marketing Manager, PowerCenter Express