Tag Archives: Cloudera

Data Visibility From the Source to Hadoop and Beyond with Cloudera and Informatica Integration

Data Visibility From the Source to Hadoop

Data Visibility From the Source to Hadoop

This is a guest post by Amr Awadallah, Founder, CTO at Cloudera, Inc.

It takes a village to build mainstream big data solutions. We often get so caught up in Hadoop use cases and customer successes that sometimes we don’t talk enough about the innovative partner technologies and integrations that enable our customers to put the enterprise data hub at the core of their data architecture and innovate with confidence. Cloudera and Informatica have been working together to integrate our products to enable new levels of productivity and lower deployment and production risk.

Going from Hadoop to an enterprise data hub, means a number of things. It means that you recognize the business value of capturing and leveraging all your data for exploration and analytics. It means you’re ready to make the move from Hadoop pilot project to production. And it means your data is important enough that it’s worth securing and making data pipelines visible. It’s the visibility layer, and in particular, the unique integration between Cloudera Navigator and Informatica that I want to focus on in this post.

The era of big data has ushered in increased regulations in a number of industries – banking, retail, healthcare, energy – most of which deal in how data is managed throughout its lifecycle. Cloudera Navigator is the only native end-to-end solution for governance in Hadoop. It provides visibility for analysts to explore data in Hadoop, and enables administrators and managers to maintain a full audit history for HDFS, HBase, Hive, Impala, Spark and Sentry then run reports on data access for auditing and compliance.The integration of Informatica Metadata Manager in the Big Data Edition and Cloudera Navigator extends this level of visibility and governance beyond the enterprise data hub.

Hadoop
Today, only Informatica and Cloudera provide end-to-end data lineage from source systems through Hadoop, and into BI/analytic and data warehouse systems. And you can view it from a single pane within Informatica.

This is important because Hadoop, and the enterprise data hub in particular, doesn’t function in a silo. It’s an integrated part of a larger enterprise-wide data management architecture. The better the insight into where data originated, where it traveled, who had access to it and what they did with it, the greater our ability to report and audit. No other combination of technologies provides this level of audit granularity.

But more so than that, the visibility Cloudera and Informatica provides our joint customers with the ability to confidently stand up an enterprise data hub as a part of their production enterprise infrastructure because they can verify the integrity of the data that undergirds their analytics. I encourage you to check out a demo of the Informatica-Cloudera Navigator integration at this link: http://infa.media/1uBpPbT

You can also check out a demo and learn a little more about Cloudera Navigator  and the Informatica integration in the recorded  TechTalk hosted by Informatica at this link:

http://www.informatica.com/us/company/informatica-talks/?commid=133311

FacebookTwitterLinkedInEmailPrintShare
Posted in Big Data, Cloud Data Integration, Governance, Risk and Compliance, Hadoop | Tagged , , , , | Leave a comment

That’s a Wrap – Informatica World 2012!

What an amazing week we had last week in Vegas with the Informatica community. I hope you all enjoyed the conference as much as I did. Having spent the last year pulling together the various components, it is wonderful when it all comes together! There were many memorable highlights ranging from  the Product Councils on the Monday, to the pool party that evening to the keynotes on Tuesday including the launch of the Informatica 9.5 Platform, breakouts, hands-on labs, the Executive Summit, the Advisory Boards, the party at Haze nightclub and the closing keynotes.

Here are a few of my favorites – what are yours? (more…)

FacebookTwitterLinkedInEmailPrintShare
Posted in Big Data, Informatica 9.5, Informatica Events, Informatica University | Tagged , , , , , , | Leave a comment

Hadoop Tuesday Update: Discovering Hadoop’s Vibrant Open Source Community

There’s a historic parallel for Hadoop’s rapidly growing ecosystem and excitement – the Linux operating system had a similar trajectory more than a decade ago. At that time, as companies embraced the open source system, a vibrant ecosystem of users, vendors and community supporters evolved to move the technology forward and add value.

Now, we see the same thing happening with Big Data, as an impressive ecosystem emerges around Hadoop. “This is a very strong and vibrant and varied community,” Matt Aslett, analyst with the451 Group, pointed out at the recent Hadoop Tuesdays webcast. “It very much reminds us of the early early stages of Linux, where you have vendors and users who each have something to gain from Hadoop being successful.” (more…)

FacebookTwitterLinkedInEmailPrintShare
Posted in Data Integration | Tagged , , | Leave a comment

Hadoop Security: Part 6 of Hadoop Series

Security is a work-in-progress for the Apache Hadoop project and sub-projects, as I discuss as part of an O’Reilly Hadoop tutorial, “Get started with Hadoop: from evaluation to your first production cluster”. Below are several of the security tips and best practices that I discuss in that article. (more…)

FacebookTwitterLinkedInEmailPrintShare
Posted in Big Data | Tagged , , , , , , , , , , | Leave a comment

Hadoop Toolbox: Part 5 of Hadoop Series

Many organizations will mix and match individual Apache projects and sub-projects using Apache Hadoop’s loosely coupled architecture. This Hadoop toolbox provides a powerful set of tools and capabilities, but it does have some important limitations that can require a platform approach to address.

The Hadoop Distributed File System (HDFS) combines storage and processing in each data node. With the HDFS file system, you can add new files or append to existing files, but not replace files without use of a new filename. The append capability works well for adding new time-stamped logs as they come in, but can complicate storage of structured files. (more…)

FacebookTwitterLinkedInEmailPrintShare
Posted in Big Data | Tagged , , , , , , , , , , , , , , , , , , , | Leave a comment

Dating With Data: Part 4 In Hadoop Series

eHarmony, an online dating service, uses Hadoop processing and the Hive data warehouse for analytics to match singles based on each individual’s “29 Dimensions® of Compatibility”, per a a June 2011 press release by eHarmony and one its suppliers, SeaMicro. According to eHarmony, an average of 542 eHarmony members marry daily in the United States. (more…)

FacebookTwitterLinkedInEmailPrintShare
Posted in Big Data | Tagged , , , , , , , | 2 Comments