ETL is not pervasive yet!
Posted in Architecture, Enterprise Data Management by Rick Sherman |![]() |
Recently I moderated a panel at the Boston TDWI chapter (I am a chapter officer) on emerging trends in business intelligence (BI). I framed the discussion by having the panelists position technology in the five stages of the Gartner Hype Cycle.
It was a lot of fun and provided some good insights. The panel agreed that ETL was on the productivity plateau — meaning it was mainstream and commonplace. Everyone assumes everyone is doing it, but I challenged whether it was truly pervasive.
To support my claim I did an informal survey of the audience and asked some questions on their use of ETL. Sure enough, everyone was using it — that’s great news. And everyone was using it to load their data warehouse — again terrific.
But here is where the fun and eye-opening insight begins. When asked if they used their ETL tool to load their data marts it turns out most did not. And how many loaded their OLAP cubes with their ETL tool? Almost nobody.
This is consistent with what I see time and time again at my clients and what I hear from fellow consultants and IT folks. Recent surveys indicate that approximately 45% of ETL work is done by hand-coding.
Using ETL to load into a data warehouse (DW) is a great first step, but why aren’t people using their ETL tools to load their data marts and cubes? Loading a DW involves significant data quality and consistency processes, plus you often have to deal with significant data volumes. That is a perfect application for an enterprise-class data integration tool. But loading data marts and cubes generally involves implementing significant (and critical) data transformations, business rules, filters and aggregations. These data marts and cubes are what the business uses for their reports and analysis to support their decision making. How can we validate and audit what the business is using to make decisions if the “last mile” of data integration is implemented via hand coding?
The answer is we really cannot.
Almost all my clients are in the same boat. This even includes my clients when I was at PricewaterhouseCoopers (PwC) Consulting (prior to their being acquired by IBM). They have significantly invested and deployed enterprise-class data integration software to load their data warehouses and then relied on hand-coding for that “last mile.”
If using the data integration software made sense to load the DW then an enterprise needs to take the next step and make data integration pervasive. That is how you truly manage your data assets and provide the business with information they can use to make decisions.
How do you proceed? First, sell the approach to the business. They are worried about compliance. With the investments they have made on their data assets they are probably very motivated to establish data integration as a pervasive process.
Second, make it a best (and only) practice to leverage data integration software when integrating data in your data warehousing and business intelligence environments. Resist the urge to do it quickly because that almost always means “quick and dirty.” Although hand-coding may be cheaper initially, in the long-run it is more costly and risky (assuming its documentation is not kept up-to-date if it was created at all.).
Finally, establish a program to systematically renovate your hand coded data integration processes with the data integration software.
If we want to truly deliver enterprise data management, we cannot stop at the data warehouse, we need to manage our data integration all the way to the point where the business is generating reports or performing analysis. That “last mile” from the data warehouse to the information consumer is what moves an enterprise into delivering information via best practices to the business.





No Comments, Comment or Ping
Reply to “ETL is not pervasive yet!”