Tag Archives: ETL
Today, 80% of the efforts in Big Data projects are related to extracting, transforming and loading data (ETL). Hortonworks and Informatica have teamed-up to leverage the power of Informatica Big Data Edition to use their existing skills to improve the efficiency of these operations and better leverage their resources in a modern data architecture. (MDA)
Next Generation Data Management
The Hortonworks Data Platform and Informatica BDE enable organizations to optimize their ETL workloads with long-term storage and processing at scale in Apache Hadoop. With Hortonworks and Informatica, you can:
• Leverage all internal and external data to achieve the full predictive power that drives the success of modern data-driven businesses.
• Optimize the entire big data supply chain on Hadoop, turning data into actionable information to drive business value.
Imagine a world where you would have access to your most strategic data in a timely fashion, no matter how old the data is, where it is stored, or under what format. By leveraging Hadoop’s power of distributed processing, organizations can lower costs of data storage and processing and support large data distribution with high through put and concurrency.
Overall, the alignment between business and IT grows. The Big Data solution based on Informatica and Hortonworks allows for a complete data pipeline to ingest, parse, integrate, cleanse, and prepare data for analysis natively on Hadoop thereby increasing developer productivity by 5x over hand-coding.
Where Do We Go From Here?
At the end of the day, Big Data is not about the technology. It is about the deep business and social transformation every organization will go through. The possibilities to make more informed decisions, identify patterns, proactively address fraud and threats, and predict pretty much anything are endless.
This transformation will happen as the technology is adopted and leveraged by more and more business users. We are already seeing the transition from 20-node clusters to 100-node clusters and from a handful of technology-savvy users relying on Hadoop to hundreds of business users. Informatica and Hortonworks are accelerating the delivery of actionable Big Data insights to business users by automating the entire data pipeline.
Try It For Yourself
On September 10, 2014, Informatica announced the 60-day trial version of the Informatica Big Data Edition into the Hortonworks Sandbox. This free trial enables you to download and test out the Big Data Edition on your notebook or spare computer and experience your own personal Modern Data Architecture (MDA).
If you happen to be at Strata this October 2014, please meet us at our booths: Informatica #352 and Hortonworks #117. Don’t forget to participate in our Passport Program and join our session at 5:45 pm ET on Thursday, October 16, 2014.
Managing the recovery and flow of data files throughout your enterprise is much like managing the flow of oil from well to refinery – a wide range of tasks must be carefully completed to ensure optimal resource recovery. If these tasks are not handled properly, or are not addressed in the correct order, valuable resources may be lost. When the process involves multiple pipelines, systems, and variables, managing the flow of data can be difficult.
Organizations have many options to automate the processes of gathering data, transferring files, and executing key IT jobs. These options include home-built scheduling solutions, system integrated schedulers, and enterprise schedulers. Enterprise schedulers, such as Skybot Scheduler, often offer the most control over the organization’s workflow, as they offer the ability to create schedules connecting various applications, systems, and platforms.
In this way, the enterprise scheduler facilitates the transfer of data into and out of Informatica PowerCenter and Informatica Cloud, and ensures that raw materials are refined into valuable resources.
Enterprise Scheduling Automates Your Workflow
Think of an enterprise scheduler as the pipeline bearing data from its source to the refinery. Rather than allowing jobs or processes to execute randomly or to sit idle, the enterprise scheduler automates your organization’s workflow, ensuring that tasks are executed under the appropriate conditions without the need for manual monitoring or the risk of data loss.
Skybot Scheduler addresses the most common pain points associated with data recovery, including:
- Scheduling dependencies: In order for PowerCenter or Cloud to complete the data gathering processes, other dependencies must be addressed. Information must be swept and updated, and files may need to be reformatted. Skybot Scheduler automates these tasks, keeping the data recovery process consistently moving forward.
- Reacting to key events: As with oil recovery, small details can derail the successful mining of data. Key events, such as directory changes, file arrivals, and evaluation requirements can lead to a clog in the pipeline. Skybot Scheduler maintains the flow of data by recognizing these key events and reacting to them automatically.
Choose the Best Pipeline Available
Skybot Scheduler is one of the most powerful enterprise scheduling solutions available today, and is the only enterprise scheduler integrated with PowerCenter and Cloud.
Capable of creating comprehensive cross-platform automation schedules, Skybot Scheduler manages the many steps in the process of extracting, transforming, and loading data. Skybot maintains the flow of data by recognizing directory changes and other key events, and reacting to them automatically.
In short, by managing your workflow, Skybot Scheduler increases the efficiency of ETL processes and reduces the potential of a costly error.
To learn more about the power of enterprise scheduling and the Skybot Scheduler check out this webinar: Improving Informatica ETL Processing with Enterprise Job Scheduling or download the Free Trial.
On a recent trip to a new city, someone said that the easiest way from the airport to the hotel was to use the Metro. I could speak the language, but reading it was another matter. I was surprised by how quickly I navigated to the hotel by following the Metro map. The Metro map is based on the successful design of the London Underground map.
Harry Beck was not a cartographer. He was an engineering draftsman. He started drawing a different type of map in his spare time. Beck believed that the passengers were not worried about the distance accuracy of the map. He reduced the map to straight lines and sharp angles, which produced a map closer to an electrical schematic diagram rather than a more common geographic map. The company that ran the London Underground was skeptical of Beck’s map since it was radically different and they had not commissioned the project. (more…)
This is the second installment of my multi-part blog series on “hitting the batch wall.” Well, it’s not so much about hitting the batch wall, but what you can do to avoid hitting the wall. Today’s topic is “throwing hardware” at the problem (a.k.a. hardware scaling). I’ll discuss the common approaches and the tradeoffs of hardware scaling with Informatica software.
Before I can begin to discuss hardware scaling, I start with this warning: faster hardware only improves the load window situation when it resolves a bottleneck. Data integration jobs are a lot like rush hour traffic, they can only run as fast as the slowest component. It doesn’t make any sense to buy a Ferrari if you will always be driving behind a garbage truck. In other words, if your ETL jobs are constrained by the source/target systems or I/O or even just memory, then faster/more CPUs will rarely improve the situation. Understand your bottlenecks before you start throwing hardware at them! (more…)
This year marks the 20th anniversary for Informatica. Twenty years of solving the problem of getting data from point A to point B, improving its quality, establishing a single view and managing it over its life-cycle. Yet after 20 years of innovation and leadership in the data integration market, when one would think the problem had been solved, all data had been extracted, transformed, cleansed and managed, it actually hasn’t — companies still need data integration. Why? Data is complicated business. And with data increasingly becoming central to business survival, organizations are constantly looking for ways to unlock new sources of it, use it as an unforeseen source of insight and do it all with greater agility and at lower cost. (more…)
In a recent webinar, Mark Smith, CEO at Ventana Research and David Lyle, vice president, Product Strategy at Informatica discussed: “Building the Business Case and Establishing the Fundamentals for Big Data Projects.” Mark pointed out that the second biggest barrier that impedes improving big data initiatives is that the “business case is not strong enough.” The first and third barriers respectively, were “lack of resources” and “no budget” which are also related to having a strong business case. In this context, Dave provided a simple formula from which to build the business case:
Return on Big Data = Value of Big Data / Cost of Big Data (more…)
“The report of my death was an exaggeration.”
– Mark Twain
Ah yes, another conference another old technology is declared dead. Mainframe… dead. Any programming language other than Java…. dead. 8 track tapes …OK, well some things thankfully do die, along with the Ford Pinto that I used to listen to the Beatles Greatest Hits Red Album over and over again on that 8 track… ah yes the good old days, but I digress. (more…)
Ever wondered if an initiative is worth the effort? Ever wondered how to quantify its worth? This is a loaded question as you may suspect but I wanted to ask it nevertheless as my team of Global Industry Consultants work with clients around the world to do just that (aka Business Value Assessment or BVA) for solutions anchored around Informatica’s products.
As these solutions typically involve multiple core business processes stretching over multiple departments and leveraging a legion of technology components like ETL, metadata management, business glossary, BPM, data virtualization, legacy ERP, CRM and billing systems, it initially sounds like a daunting level of complexity. Opening this can of worms may end up in a measurement fatigue (I think I just discovered a new medical malaise.) (more…)
In my previous blog on this subject, I talked about the incredible innovations of Hadoop as a new analytics engine, and the innovations of Informatica in removing un-maintainable and complex hand-coding. In this blog I want to drill into the world of Informatica ETL and Hadoop in order to show why these two innovations are critical to augmenting traditional data processing approaches as companies begin to look at leveraging Big Data for new analytics. (more…)