An Introduction to Big Data and Hadoop from Informatica University
A full house, lots of funny names and what does it all mean?
Cloudera, Appfluent and Informatica partnered today at Informatica World in Las Vegas to deliver together a one day training session on Introduction to Hadoop and Big Data. Technologies overview, best practices, and how to get started were on the agenda. Of course, we needed to start off with a little history. Processing and computing was important in the old days. And, even in the old days it was hard to do and very expensive.
Today it’s all about scalability. What Cloudera does is “Spread the Data and Spread the Processing” with Hadoop optimized for scanning lots of data. It’s the Hadoop File System (HDFS) that slices up the data. It takes a slice of data and then takes another slice. Map Reduce is then used to spread the processing. How does spreading the data and the processing help us with scalability?
When we spread the data and processing we need to index the data. How do we do this? We add the Get Puts. That’s Get a Row, Put a Row. Basically this is what helps us find a row of data easily. The potential for processing millions of rows of data today is more and more a reality for many businesses. Once we can find and process a row of data easily we can focus on our data analysis.
Data Analysis, what’s important to you and your business? Appfluent gives us the map to identify data and workloads to offload and archive to Hadoop. It helps us assess what is not necessary to load into the Data Warehouse. The Data Warehouse today with the exponential growth in volume and types of data will soon cost too much unless we identify what to load and offload.
Informatica has the tools to help you with processing your data. Tools that understand Hadoop and that you already use today. This helps you with a managing these volumes of data in a cost effective way. Add to that the ability to reuse what you have already developed. Now that makes these new tools and technologies exciting.
In this Big Data and Hadoop session, #INFA14, you will learn:
- Common terminologies used in Big Data
- Technologies, tools, and use cases associated with Hadoop
- How to identify and qualify the most appropriate jobs for Hadoop
- Options and best practices for using Hadoop to improve processes and increase efficiency
Live action at Informatica World 2014, May 12 9:00 am – 5:00 pm and updates at: