Tag Archives: ETL
On a recent trip to a new city, someone said that the easiest way from the airport to the hotel was to use the Metro. I could speak the language, but reading it was another matter. I was surprised by how quickly I navigated to the hotel by following the Metro map. The Metro map is based on the successful design of the London Underground map.
Harry Beck was not a cartographer. He was an engineering draftsman. He started drawing a different type of map in his spare time. Beck believed that the passengers were not worried about the distance accuracy of the map. He reduced the map to straight lines and sharp angles, which produced a map closer to an electrical schematic diagram rather than a more common geographic map. The company that ran the London Underground was skeptical of Beck’s map since it was radically different and they had not commissioned the project. (more…)
This is the second installment of my multi-part blog series on “hitting the batch wall.” Well, it’s not so much about hitting the batch wall, but what you can do to avoid hitting the wall. Today’s topic is “throwing hardware” at the problem (a.k.a. hardware scaling). I’ll discuss the common approaches and the tradeoffs of hardware scaling with Informatica software.
Before I can begin to discuss hardware scaling, I start with this warning: faster hardware only improves the load window situation when it resolves a bottleneck. Data integration jobs are a lot like rush hour traffic, they can only run as fast as the slowest component. It doesn’t make any sense to buy a Ferrari if you will always be driving behind a garbage truck. In other words, if your ETL jobs are constrained by the source/target systems or I/O or even just memory, then faster/more CPUs will rarely improve the situation. Understand your bottlenecks before you start throwing hardware at them! (more…)
This year marks the 20th anniversary for Informatica. Twenty years of solving the problem of getting data from point A to point B, improving its quality, establishing a single view and managing it over its life-cycle. Yet after 20 years of innovation and leadership in the data integration market, when one would think the problem had been solved, all data had been extracted, transformed, cleansed and managed, it actually hasn’t — companies still need data integration. Why? Data is complicated business. And with data increasingly becoming central to business survival, organizations are constantly looking for ways to unlock new sources of it, use it as an unforeseen source of insight and do it all with greater agility and at lower cost. (more…)
In a recent webinar, Mark Smith, CEO at Ventana Research and David Lyle, vice president, Product Strategy at Informatica discussed: “Building the Business Case and Establishing the Fundamentals for Big Data Projects.” Mark pointed out that the second biggest barrier that impedes improving big data initiatives is that the “business case is not strong enough.” The first and third barriers respectively, were “lack of resources” and “no budget” which are also related to having a strong business case. In this context, Dave provided a simple formula from which to build the business case:
Return on Big Data = Value of Big Data / Cost of Big Data (more…)
“The report of my death was an exaggeration.”
– Mark Twain
Ah yes, another conference another old technology is declared dead. Mainframe… dead. Any programming language other than Java…. dead. 8 track tapes …OK, well some things thankfully do die, along with the Ford Pinto that I used to listen to the Beatles Greatest Hits Red Album over and over again on that 8 track… ah yes the good old days, but I digress. (more…)
Ever wondered if an initiative is worth the effort? Ever wondered how to quantify its worth? This is a loaded question as you may suspect but I wanted to ask it nevertheless as my team of Global Industry Consultants work with clients around the world to do just that (aka Business Value Assessment or BVA) for solutions anchored around Informatica’s products.
As these solutions typically involve multiple core business processes stretching over multiple departments and leveraging a legion of technology components like ETL, metadata management, business glossary, BPM, data virtualization, legacy ERP, CRM and billing systems, it initially sounds like a daunting level of complexity. Opening this can of worms may end up in a measurement fatigue (I think I just discovered a new medical malaise.) (more…)
In my previous blog on this subject, I talked about the incredible innovations of Hadoop as a new analytics engine, and the innovations of Informatica in removing un-maintainable and complex hand-coding. In this blog I want to drill into the world of Informatica ETL and Hadoop in order to show why these two innovations are critical to augmenting traditional data processing approaches as companies begin to look at leveraging Big Data for new analytics. (more…)
I recently had the pleasure of participating in a big data panel at the Pacific Crest investor’s conference (the replay available here.) I was joined on the panel by Hortonworks, MapR, Datastax and Microsoft. There is clearly a lot of interest in the world of big data and how the market is evolving. I came away from the panel with four fundamental thoughts: (more…)
The widespread adoption of electronic health records (EHRs) is a key objective of the Health Information Technology for Economic and Clinical Health (HITECH) Act, enacted as part of the American Recovery and Reinvestment Act of 2009. With the pervasive use of EHRs, an enormous volume of clinical data will be readily accessible that has previously been locked away in paper charts. The potential value of this data to yield insights into what works in healthcare, and what doesn’t work, dwarfs the benefits of simply replacing a paper chart with an electronic system. There’s appropriate enthusiasm that this data is going to be a veritable goldmine for enterprise data warehousing, business intelligence, and comparative effectiveness research. However, there are other, equally valuable, uses for this data to enhance clinical decision-making and improve the value of healthcare spending. Simply having instant access to large volumes of data that span thousands or tens-of-thousands of physicians, hundreds-of-thousands of patients and millions of encounters, offers an unparalleled opportunity to increase the quality and lower the cost of healthcare. (more…)