Why is ETL a four-letter word?

ETL (Extract-Transform-Load) technology has been around for over a decade, and while it rocked the world in the 90’s, it’s considered a bit of a relic nowadays. Data warehousing, the original driver for ETL technology, isn’t considered as sexy anymore. That’s in part why vendors have used different names to broaden this software category and added new capabilities to keep it relevant.

Informatica is no exception. We’re “the Data Integration Company“, where data integration consists of many different capabilities, only one of which is ETL (granted, the ETL piece is the cornerstone for data warehousing and other data integration projects).

And the letters E-T-L themselves have been put in the blender to be reconfigured into newer, fresher concepts. ELT or ETLT incorporates the concept of pushdown optimization, where processing is handled in the database, instead of the ETL server. (For more detail, Rajan Chandras has a good post discussing ETL vs. ELT.) ETQL pulls data quality into the ETL workflow. And I’m sure the permutations will continue.

So, is classic ETL just not relevant anymore? Of course not. It’s still the workhorse that helps move huge amounts of data from one system to another. It would be hard to get any of the other important data integration stuff — data quality, B2B integration, real-time synchronization, metadata management — done without ETL as the foundation. And most companies would simply not be able to keep data integrated across their environment without ETL.

A different way to ask the question is: “Has basic ETL been commoditized?” The answer, of course, depends on how you define things. Are there open source and very low cost ETL offerings on the market? Sure. Can they extract data, perform some transformations on it, and load it someplace? Sure. Will it work for your environment? That’s the $64,000 question.

Here’s the thing. Every organization has some simple data movement use cases where a very basic ETL capability is more than sufficient. But those are the situations that probably don’t cause a whole lot of pain anyway. The stuff that is painful to the business is the stuff that’s hard to solve– obscure and exotic data formats, extreme performance requirements, complex business logic, really dirty data that has to be cleansed. That’s when you need ETL++, or data integration, or whatever you want to call it, as Gartner states in this article commenting on the data integration Magic Quadrant.

Fundamentally, organizations are still struggling with really hairy data issues, and the solution can’t be cookie-cuttered. When that day arrives, ETL will be commoditized. But I’m not holding my breath.

This entry was posted in Data Integration, Data Warehousing and tagged , , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>