Tag Archives: Architecture
The recent Informatica Release 8.5 launch highlighted Real-time Integration Competency Centers (ICCs) as the optimal model for successful data integration. I’d like to review the concept of the Real-time ICC and why Release 8.5 supports this advanced operational, organizational and technology model.
As data integration moves beyond the realm of data warehousing into operational integration, real-time and data services use cases have exploded in importance to the business and necessitated stronger, unified infrastructure for IT to meet the challenge. Philip Russom, Senior Manager, TDWI Research captures this trend specifically in his quote on Release 8.5.
“The movement toward real-time data access and delivery has been the most influential trend in data integration this decade. The trend has enabled user organizations to initiate a variety of valuable real-time practices, including operational BI, real-time data warehousing, on-demand computing, performance monitoring, just-in-time inventory, and so on. And the trend has led vendors to extend their data integration products, so that many functions operate in real-time, not just batch. Informatica 8.5 is a great example of this trend, because it’s re-architected to support more real-time and on-demand functions for data integration, changed data capture, and data quality.” (more…)
Recently I moderated a panel at the Boston TDWI chapter (I am a chapter officer) on emerging trends in business intelligence (BI). I framed the discussion by having the panelists position technology in the five stages of the Gartner Hype Cycle.
It was a lot of fun and provided some good insights. The panel agreed that ETL was on the productivity plateau — meaning it was mainstream and commonplace. Everyone assumes everyone is doing it, but I challenged whether it was truly pervasive.
To support my claim I did an informal survey of the audience and asked some questions on their use of ETL. Sure enough, everyone was using it — that’s great news. And everyone was using it to load their data warehouse — again terrific.
But here is where the fun and eye-opening insight begins. When asked if they used their ETL tool to load their data marts it turns out most did not. And how many loaded their OLAP cubes with their ETL tool? Almost nobody.
This is consistent with what I see time and time again at my clients and what I hear from fellow consultants and IT folks. Recent surveys indicate that approximately 45% of ETL work is done by hand-coding.
One technical challenge not often discussed in data integration circles is the impact of real-time data to performance and scalability. I attribute this to a lack of real-world experience in handling real-time data, or a lack of recognition by IT that data integration software can effectively manage real-time data. Many architects and IT developers that I meet lump real-time into the EAI domain. This was a logical assumption 5 years ago, due to the fact that the data integration market was then primarily known for tackling “large batch volume” workloads (or as I like to refer to them “big batch problems”)
Informatica has spent 10 years focused to a good degree on solving that “big batch” problem. The inherent division between design time and run time in the underlying platform architecture enabled the introduction of parallelization/partitioning techniques, 64 bit processing, support for RDBMS vendor supplied batch utilities/APIs and improved data conversion/transformation without impacting the business logic design. This has proven invaluable to our customers in meeting their increasing volume, and in shrinking load window requirements.
We’ve been discussing the three pillars of an ICC, organization, process and technology, for a while now. In this segment, I’ll focus on a range of technology requirements facing ICC implementations teams, whether they are starting from scratch or morphing a set of disparate solutions into a common infrastructure. It goes without saying, to meet the demand of a broader set of enterprise needs rather than those of a single line of business, the infrastructure powering an ICC needs to evolve and mature.
One of the first aspects related to infrastructure is the need for high availability. This pertains to the overall integration infrastructure environment. “Shared Infrastructure” by its very nature increases the need for reliability. An outage of a single point solution is acceptable and explainable but when several organizations are relying on solutions delivered by an ICC, outages can significantly impact revenue and productivity.
As mentioned in my previous post, I was at Informatica World 2007 in Orlando last week. I stand corrected, this year was the 9th official conference; my customer event in 1997 didn’t officially count. I’m sure next year will be an even bigger celebration of “Integration Everywhere”. The customer and partner interaction was quite refreshing with ample time to network with peers and explore possible solutions provided by 3rd parties. The night life was vibrant and active.
One of the most compelling sessions I attended was on Data Integration performance optimization and a new technique available to leverage both a database engine and the traditional “ETL Server”. Called a “Hybrid ETLT Approach”, Stephen Brobst, CTO of Teradata, opened with a review of scenarios where processing in the database makes the most sense and where a data integration engine is best used in the end to end data processing and delivery lifecycle. Stephen also showed how design really isn’t changed at all, you build the same transformations and at run-time, the location of processing can be optimized. That way, metadata is still preserved and all the speed and compliance benefits of visual design approach pertain.