Tag Archives: Technology
In my last post I started to talk about ideas for classifying the data management issues, with the reasoning that it will help to determine the feasibility that the expectation that acquiring a particular solution will actually address the core issues. I actually have used this categorization with some of our customers, and the process of classification does lend some clarity when considering solutions. There are five categories: (more…)
ETL (Extract-Transform-Load) technology has been around for over a decade, and while it rocked the world in the 90′s, it’s considered a bit of a relic nowadays. Data warehousing, the original driver for ETL technology, isn’t considered as sexy anymore. That’s in part why vendors have used different names to broaden this software category and added new capabilities to keep it relevant.
Informatica is no exception. We’re “the Data Integration Company“, where data integration consists of many different capabilities, only one of which is ETL (granted, the ETL piece is the cornerstone for data warehousing and other data integration projects).
And the letters E-T-L themselves have been put in the blender to be reconfigured into newer, fresher concepts. ELT or ETLT incorporates the concept of pushdown optimization, where processing is handled in the database, instead of the ETL server. (For more detail, Rajan Chandras has a good post discussing ETL vs. ELT.) ETQL pulls data quality into the ETL workflow. And I’m sure the permutations will continue.
So, is classic ETL just not relevant anymore? (more…)
Are BI managers and professionals sometimes too eager to please the business? Are centralized BI efforts slowing down progress? Should BI teams address requirements before the business even asks for them? These questions may seem counter-intuitive, but Wayne Eckerson, director of research for TDWI, says that the best intentions for BI efforts in many organizations may actually result in sluggish projects, duplication of effort, and misaligned priorities between BI teams and the business. (more…)
There’s no question that integrating analytical and transaction data to deliver “Pervasive Business Intelligence” can be a significant project for many enterprises. However, the good news is that it’s a capability that’s within the reach of many enterprises today. That’s the gist of a Q&A with three industry thought leaders, published in the latest edition of Intelligent Enterprise. (more…)
I’ll admit it, as an older brother, I didn’t want my younger sister borrowing or bugging me for my prized possessions. I still hoard things at work, old computer equipment, mice, cables, all in the name of finding a use for them at some point. I just like to know they are there when you need them as you can see here.
Is data treated the same way within corporations? Do application owners like sharing their data with others? In my experience, no, they don’t. Ask any mainframe or ERP program manager about utilizing their production data for other purposes and I’m sure you’ll receive a litany of questions around impact to production systems, utilization costs, and complexity of access. And IT’s business request list for access to these precious resources is only growing. For many organizations, data access is a cultural problem.
Since launching the EDM blog in early 2007, we have focused on a wide variety of data management, Informatica usage and technology topics. In 2008, I will also be discussing my experiences and research in Enterprise Data Warehousing, an area that our customers have used our software and solutions to great success.
Enterprise Data Warehousing is a term that has been around for a long time. In the mid-90’s, Bill Inmon preached an enterprise approach to data warehousing that was based on a central repository of corporate data. With the technology at the time, success was only attainable by a few elite organizations at extreme levels of funding. Informatica pioneered an incremental data mart approach that led to years of prosperity in the Data Warehousing market for Informatica and customers using our technology for their data warehousing related projects.
A recent InformationWeek article* described the growth in IT employment across the US as a result of a shift in skills. Rather than focusing on pure IT proficiency, organizations are looking for talent with “a more hybrid mix of technology skills, along with an understanding of the business and its customers.”
IT departments are highly motivated to increase the level of collaboration with their counterparts in the business. Nowhere is this more critical than in the area of data quality and the trend is causing a shift in the way companies are looking to solve their data quality issues. First generation data quality tools had a natural focus on technology, instead of business. Here are some of the differences between technology focused data quality solutions and business-focused data quality solutions.
Tools vs. Process
Technology focused data quality solutions provide tools that automate data processing. Evidence of this type of focus can be seen in the way that vendors will tout the sophistication and type of their algorithms over and above their ability to support ongoing data quality management processes. While technology is extremely important, its relevance cannot eclipse the overall data quality management process. Even if your data quality tool can automate the correction of 95 percent of the data, if the remaining five percent cannot be managed properly, you will continue to suffer from poor data quality.
The recent Informatica Release 8.5 launch highlighted Real-time Integration Competency Centers (ICCs) as the optimal model for successful data integration. I’d like to review the concept of the Real-time ICC and why Release 8.5 supports this advanced operational, organizational and technology model.
As data integration moves beyond the realm of data warehousing into operational integration, real-time and data services use cases have exploded in importance to the business and necessitated stronger, unified infrastructure for IT to meet the challenge. Philip Russom, Senior Manager, TDWI Research captures this trend specifically in his quote on Release 8.5.
“The movement toward real-time data access and delivery has been the most influential trend in data integration this decade. The trend has enabled user organizations to initiate a variety of valuable real-time practices, including operational BI, real-time data warehousing, on-demand computing, performance monitoring, just-in-time inventory, and so on. And the trend has led vendors to extend their data integration products, so that many functions operate in real-time, not just batch. Informatica 8.5 is a great example of this trend, because it’s re-architected to support more real-time and on-demand functions for data integration, changed data capture, and data quality.” (more…)
We’ve been discussing the three pillars of an ICC, organization, process and technology, for a while now. In this segment, I’ll focus on a range of technology requirements facing ICC implementations teams, whether they are starting from scratch or morphing a set of disparate solutions into a common infrastructure. It goes without saying, to meet the demand of a broader set of enterprise needs rather than those of a single line of business, the infrastructure powering an ICC needs to evolve and mature.
One of the first aspects related to infrastructure is the need for high availability. This pertains to the overall integration infrastructure environment. “Shared Infrastructure” by its very nature increases the need for reliability. An outage of a single point solution is acceptable and explainable but when several organizations are relying on solutions delivered by an ICC, outages can significantly impact revenue and productivity.
When I teach data integration concepts in my classes at various corporations or to my students at Northeastern University, one of the topics I address is the differences between engine-based and database-based data integration tools.
You may be more familiar with their other names:
• ETL (extract, transform and load) or engine-based
• ELT (extract, load and transform) or database-based tools
Inevitably, someone asks me which one is best. I have answer on two levels. The first answer is that neither is the best under all circumstances but depends entirely on the data integration scenario that is being addressed. Evolving industry trends are causing enterprises to often need both alternatives in order to best meet their full range of data integration needs.
The second part of the answer is that up until recently you could not compare ETL versus ETL purely on their own merits because the data integration tools that typically performed either approach tended to operate on two ends of the spectrum.
Historically the best of breed, as measured by Gartner Research and Forrester Research, provided engine-based or ETL approach. That meant that the two opposing camps on this issue were not just debating the technical merits of these alternatives but also the underlying data integration tools that provided this functionality. Fortunately this has constraint has recently changed.
Let’s quickly look at history. The first generation of data integration tools was code generators. At the same time, we were just beginning to adopt relational databases. Both that generation of data integration tools and relational databases had limited functionality.
Engine-based ETL tools were developed to address enterprises’ unmet data integration needs. The top-tier ETL tools have evolved into data integration suites incorporating much more than simply ETL processing.
But something happened along the way: relational databases and they way they were used grew more sophisticated. Nowadays databases can perform many integration tasks very effectively, especially if the data continues to reside in the same database instance or server.
That brings us to the second thing that changed. Data is now more apt to stay in the same database, even after it has been integrated into a data warehouse. Formerly, data was moved into data marts or cubes in separate databases.
This is the perfect condition to consider whether you should use ELT versus an ETL approach.
Integrating the data into the data warehouse involves the heavy lifting such as data cleansing, conforming dimensional data and handling slowly changing dimensions, for example. On the other hand, moving it into data marts or cubes involves the more lightweight tasks of filtering, aggregating and applying business transformations.
ETL is most often the choice for the heavy-lifting requirements of the former. ELT might be the better choice for the lightweight tasks of the latter.