What you need for modern Data Integration (Hint: it’s not just batch or ETL)
While today’s modern landscape is now more hybrid and cloud for most customers, and the lines are starting to blur between application and data integration, it is also clear that “data remains data” whether it’s in the cloud or on-premise. Every organization needs data integration to run their business.
Data Integration is not the same as batch processing. Transformations required to translate one message format to another (e.g. what an ESB does) are not equivalent to set-based algorithms required for integration of heterogeneous data. Concatenating First_Name and Last_Name together is not data integration. I could use shell scripting and ftp to process data in parallel. Data integration is much more than batch processing or Extract, Transformation and Load (ETL).
So, what are the key core capabilities of data integration?
“Transformation” is not multiplying Unit_Price times Quantity. Transformation is fuzzy joining heterogeneous data in parallel using in-memory caching technology. Transformation is adding address cleansing to that same data pipeline. Transformation is handling the ragged hierarchies of nested documents or handling the natural language or sentiment processing of unstructured data. And once defined, there are an increasing number of engines to execute these transformations: an RDBMS like Teradata or Oracle, a massively parallel clustering framework like Hadoop, or a grid of Informatica services. Using a metadata-driven, visual diagramming environment and Vibe that can “Map Once, Deploy Anywhere”, Informatica has separated the “what” transformations are desired from the “how” and “where” those transformations are executed, using an intelligent optimizer to push the work to the engine with the most optimized capabilities, all without requiring Map/Reduce, BTEQ, PL/SQL, Java, Python, or other skills.
Today, data integration must be possible at any latency: real-time, event-driven, or batch. Once we’ve cleansed and conformed a logical definition of data, we would like to make this data available anytime, anywhere, controlled by security and not tool capability. True data integration should allow Change Data Capture, or CDC, whenever helpful. Wouldn’t we rather move 1/1000th of the amount of data by processing only changes rather than moving everything every time? Constructing integration services of changed data for any latency using common data transformation definitions without altering source applications or configuring database triggers allows the most efficient data movement and change management possible.
Metadata and Transactional Intelligence
The statistical knowledge of the data realities is so important to the people collaborating to solve data integration problems that Informatica put this visibility in the ‘workflow’ of SaaS administrators, business analysts, data stewards, architects, and developers. Intelligence about data domains, inference about relationships, and knowledge about lineage and impact analysis: these are capabilities that computers excel at. With intelligent data integration, users’ knowledge is augmented by metadata and data knowledge to make integration and business/IT collaboration much more agile. Knowledge of what constitutes transactions in applications, how master data is modeled, or how logs or data interchange formats are defined are crucial to reliably handling the complexities of enterprise data. Additionally, this metadata intelligence enables business users to directly access the appropriate data that they need, when they need it.
Self-Service + Governance
Thanks to the cloud, the business has many more options to directly take control of the data that they need and modern data integration solutions need to fully support business self-service, without IT’s direct involvement. Complementary self-service data integration capabilities within cloud and hybrid IT landscapes offers the agility the business requires and the governance that IT desires.
Rather than central IT serving as the gatekeeper for all data progress, self-service integration capabilities for SaaS admins and business analysts can that log, audit and govern the behavior of line-of-business users also empower more people to participate while supporting corporate objectives of data security and adherence to regulatory requirements. In addition, with cloud apps such as Salesforce.com becoming mission critical business app hubs for many companies, data integration needs to be enabled within the context of the cloud app itself, creating a seamless user experience for SaaS users.
A data integration solution that doesn’t support native UI generation for SaaS apps such as Salesforce.com, creates a costly and time consuming overhead for IT to write custom code and to build bespoke UI’s. Through this combination of simplifying data access for business users and yet with improved governance and oversight, we can achieve faster and better data integration for the business.
Unified Data and Process Integration
Data and process integration need to work seamlessly together, it’s not just a blurring. A unified data and process integration solution provides the best of both worlds and needs to support long-running transactions, human workflow, business self-service and template-driven extensibility.
Managed Master Data
Managed master data can glue the transactions, events and interactions of an enterprise together to create a single view. Managed reference data describes the processes and classifications of master data and transactions across your organization. And managed metadata helps you see what you have, where you have it, when it changed, and who is responsible for it. A real data integration solution manages all of these together, allowing us to understand the interconnections to manage data complexity, change implications and quality issues.
Data integration is much more than batch processing. Data integration is more than ETL. The value of data is now headline news in the mainstream press, but the variety, complexity, and pace of change of data is greater than ever. A modern, complete data integration solution helps you harness the vast volumes of data across on-premises, cloud, social, mobile and other devices to power your business and better meet your customer needs in the most agile and efficient way possible.