Data Integration - Informatica

Informatica Enterprise Data Management

What is ELT?

James Markarian

ELT stands for Extract Load and Transform and is a technique for moving and transforming data from one location and format to another instance and format. In this style of integration, the target DBMS becomes the transformation engine. This is in contrast to traditional ETL, or Extract Transform and Load which is the traditional technique for moving and transforming data in which an ETL engine that is separate from either the source or target DBMS performs the data transformations. Many of the discussions I’ve read on the topic of ETL vs. ELT focus on discussions like how does one work vs. the other and when I should one use a particular approach. I’d like to talk about a few of those questions, but first I’d like to tackle a slightly different question: why should you care about ELT?When thinking about data integration techniques, it’s important to understand what you are optimizing for. Depending on the objectives, one could optimize for timeliness (average time of information availability), cost of integration, availability of information (uptime of the integration chain), data cleanliness and integrity, process auditability or other factors. An understanding of the primary optimization objective is the first step in determining an overall integration approach.

The reality is, with a scalable Data Integration platform like Informatica and a scalable dbms platform, one can basically achieve any level of performance they desire using either integration technique. What we often lose sight of, however, is cost and efficiency. Some of the writings I’ve read on the topic, including the excellent blog from Dan Lindstedt. — ETL, ELT - Challenges and Metadata

Dan introduces what I would consider to be arbitrary volume metrics to determine when to use one technique vs. the other which more or less end up looking like: high volume = ELT, lower volume = ETL.

Informatica supports both techniques. So why do we care about ELT? Because it provides optimal efficiency and reduced cost in certain integration scenarios regardless of the data volumes involved. This is an indirect way of stating that ELT can be the most efficient and therefore proper way to process data regardless of volume. It can also be the least efficient regardless of volume. Let’s explore this further.

Everything in IT has at least four categories of costs associated with it: the upfront licensing cost, the cost of operation (electricity, cooling, etc.), the implementation cost and the maintenance cost which includes both manpower and vendor software support. If we are optimizing integration workloads, we must consider most of these and we must consider them both in the context of integration platform (hardware and software) and dbms (hardware and software) platform costs.

Integration jobs, even on the same project, can have tremendously different signatures. Some information gets passed through relatively unscathed (un-transformed) and is more or less passed through as-is. The assumption is that the time cost of doing format-oriented transformations (unpadding, splitting, text case standardizing, etc) is zero amortized over the networking or process switching costs of moving from one data source to another or the I/O cost of reading and writing the data from the source and to the target respectively. Do the processing in the ETL engine, do it ELT style in the source dbms or target dbms, it doesn’t matter much. It should all perform about the same.

Selection and projection operations, hierarchical operations, set-oriented transformations like heterogeneous joins or aggregates on either entire sets of data or small-sets with in streams of real-time data on the other hand have high compute costs relative to other costs (and in some cases can introduce additional I/O).

No single approach to integration (ELT or ETL) will give an optimal solution to all workloads from either the perspective of cost, efficiency or performance.

Some questions to consider:

1. What is the current workload on my source and target dbms platforms?
2. What is the cost for the hardware and software to add additional computing resources to my dbms environment?
3. What is the cost for the hardware and software to add additional computing resources to my integration environment?
4. What is the relative efficiency of performing a particular operation in my source system, target system or integration system?

The last question of course is the toughest because while the first three questions have absolute answers, the fourth depends on the operation. Also, from a real-world throughput and performance perspective, it varies with the rest of the workload in all three environments.

In the next blog entry, I’ll further explore specific integration problems and break down how each should be executed in order to optimize cost and efficiency.

4 Comments, Comment or Ping

  1. Amit Baranwal

    I read your blog. But i need crystal clear view on ELT. If ELT means just Extract, Load & Transformation. Then let us assume one hypothetical case:
    (i) If i load data from source data into target via using any version of Informatica tool between them. Would this be case of ELT?

    (ii) If i transform data of data warehouse using any version of Informatica tool e.g. take one table of data warehouse transform its data, then put it in another table of same data warehouse. Would this be case of ELT?

    If these all the case of ELT, then why it is said that Informatica 8 only has ELT support not previous version of the same.

    Actually i read many blogs but none of them discussing about real implementation of ELT.

    I am waiting for reply.

  2. Amit-

    ELT differs from traditional ETL or Data Integration in that some or all of the transformation logic is pushed to the database engine for execution. We call this process Push Down Optimization because it is a performance optimization, in some cases, to co-locate the transformation logic where data already resides.

    In traditional ETL or Data Integration applications, 100% of the transformation is executed by the ETL/DI engine. When to use one vs. the other depends on a number of factors including location of the data, network bandwidth, availability of compute resources in various locales and the permissability of pushing logic to the dbms as specified by the dba's. Starting in version 8.0 of Informatica, we give the user the option of deploying transformation logic to either be executed by the Informatica engine (this is the ETL model), by the dbms engine (this is the ELT model) or a hybrid (this is effectively ETLT) in which part of the execution is handled by Informatica and part by the dbms. The mapping/specification is exactly the same in both cases and this is merely a deployment option. The beauty of this is that you get all the lineage, impact analysis and metadata at the same time as optimal performance.

    So, to answer your questions directly:

    (i) No. You must use a version that supports push down optimization and choose this as a deployment option. Keep in mind that if you are simply extracting source data and loading it into a target with no transformation, then ETL or ELT are equivalent. If you are transforming the data using the Informatica engine alone, you are doing tradtional ETL. If you choose to use push-down optimization and have the logic execute in ELT fashion, then Informatica determines if and how the logic can be covered into SQL and execute within the database.

    (ii) The issue isn't whether data is extracted, transformed and loaded that determines whether a process is considered ETL or ELT according to my definition, but which engine is doing the work. Using pre-8.0 of Informatica, there was no option. The Informatica engine had to do all of the work and everythign executed as ETL. In version 8.0 and beyond, you have the option of pushing down all of the logic in this case (your example would get converted into a (INSERT INTO xxx as SELECT from yyy) statement instead of having the data get pulled out of the dbms, transformed by the Informatica engine and inserted into the new table. Pushdown in this case can be very efficient, but does increase the load on the dbms.

    -james

  3. Amit Baranwal

    Thank you for reply. I have got most of the things clear by you. I'll again discuss, if i need.

Reply to “What is ELT?”