Tag Archives: Cloudera
What an amazing week we had last week in Vegas with the Informatica community. I hope you all enjoyed the conference as much as I did. Having spent the last year pulling together the various components, it is wonderful when it all comes together! There were many memorable highlights ranging from the Product Councils on the Monday, to the pool party that evening to the keynotes on Tuesday including the launch of the Informatica 9.5 Platform, breakouts, hands-on labs, the Executive Summit, the Advisory Boards, the party at Haze nightclub and the closing keynotes.
Here are a few of my favorites – what are yours? (more…)
There’s a historic parallel for Hadoop’s rapidly growing ecosystem and excitement – the Linux operating system had a similar trajectory more than a decade ago. At that time, as companies embraced the open source system, a vibrant ecosystem of users, vendors and community supporters evolved to move the technology forward and add value.
Now, we see the same thing happening with Big Data, as an impressive ecosystem emerges around Hadoop. “This is a very strong and vibrant and varied community,” Matt Aslett, analyst with the451 Group, pointed out at the recent Hadoop Tuesdays webcast. “It very much reminds us of the early early stages of Linux, where you have vendors and users who each have something to gain from Hadoop being successful.” (more…)
Security is a work-in-progress for the Apache Hadoop project and sub-projects, as I discuss as part of an O’Reilly Hadoop tutorial, “Get started with Hadoop: from evaluation to your first production cluster”. Below are several of the security tips and best practices that I discuss in that article. (more…)
Many organizations will mix and match individual Apache projects and sub-projects using Apache Hadoop’s loosely coupled architecture. This Hadoop toolbox provides a powerful set of tools and capabilities, but it does have some important limitations that can require a platform approach to address.
The Hadoop Distributed File System (HDFS) combines storage and processing in each data node. With the HDFS file system, you can add new files or append to existing files, but not replace files without use of a new filename. The append capability works well for adding new time-stamped logs as they come in, but can complicate storage of structured files. (more…)
eHarmony, an online dating service, uses Hadoop processing and the Hive data warehouse for analytics to match singles based on each individual’s “29 Dimensions® of Compatibility”, per a a June 2011 press release by eHarmony and one its suppliers, SeaMicro. According to eHarmony, an average of 542 eHarmony members marry daily in the United States. (more…)