Tag Archives: cloud-based

How Parallel Data Loading and Amazon Redshift Redefine Data Warehousing Performance

As Informatica Cloud product managers, we spend a lot of our time thinking about things like relational databases. Recently, we’ve been considering their limitations, and, specifically, how difficult and expensive it is to provision an on-premise data warehouse to handle the petabytes of fluid data generated by cloud applications and social media.  As a result, companies have to often make tradeoffs and decide which data is worth putting into their data warehouse.

Certainly, relational databases have enormous value. They’ve been around for several decades and have served as a bulwark for storing and analyzing structured data. Without them, we wouldn’t be able to extract and store data from on-premise CRM, ERP and HR applications and push it downstream for BI applications to consume.

With the advent of cloud applications and social media however, we are now faced with managing a daily barrage of massive amounts of rapidly changing data, as well as the complexities of analyzing it within the same context as data from on-premise applications. Add to that the stream of data coming from Big Data sources such as Hadoop which then needs to be organized into a structured format so that various correlation analyses can be run by BI applications – and you can begin to understand the enormity of the problem.

Up until now, the only solution has been to throw development resources at legacy on-premise databases, and hope for the best. But given the cost and complexity, this is clearly not a sustainable long-term strategy.

As an alternative, Amazon Redshift, a petabyte-scale data warehouse service in the cloud has the right combination of performance and capabilities to handle the demands of social media and cloud app data, without the additional complexity or expense. Its Massively Parallel Processing (MPP) architecture allows for the lightning fast loading and querying of data. It also features a larger block size, which reduces the number of I/O requests needed to load data, and leads to better performance.

By combining Informatica Cloud with Amazon Redshift’s parallel loading architecture, you can make use of push-down optimization algorithms, which process data transformations in the most optimal source or target database engines. Informatica Cloud also offers native connectivity to cloud and social media apps, such as Salesforce, NetSuite, Workday, LinkedIn, and Twitter, to name a few, which makes it easy to funnel data from these apps into your Amazon Redshift cluster at faster speeds.

If you’re at the Amazon Web Services Summit today in New York City, then you heard our announcement that Informatica Cloud is offering a free 60-day trial for Amazon Redshift with no limitations on the number of rows, jobs, application endpoints, or scheduling. If you’d like to learn more, please visit our Redshift Trial page or go directly to the trial.

FacebookTwitterLinkedInEmailPrintShare
Posted in Cloud, Cloud Computing, Cloud Data Integration, Cloud Data Management | Tagged , , , , | Leave a comment

MDM Becoming More Critical in Light of Cloud Computing

Over the weekend note blogger David Linthicum did a blogpost on eBiz regarding master data management (MDM) and cloud computing. The crux of David’s argument is that while the profusion of cloud computing will exacerbate the need for MDM, the rush to embrace cloud applications could potentially drive MDM into the background at many companies. That, ironically enough, since organizations can save so much money by replacing big enterprise systems with lighter SaaS applications, in the headlong rush to embrace cloud applications “MDM will be an afterthought” and get pushed aside even as the need for it intensifies.

I agree with David that the migration to cloud computing is going to further spark demand for MDM, but I don’t agree that MDM is going to get pushed aside. The reason I make this argument is that we already have a few customers at Siperian who are using MDM with cloud-based applications, and it’s working out very well. These customers are combing MDM with the cloud in the following two ways:

1. Using MDM to create a single version of the truth before enabling the cloud-based applications (i.e., they’re cleaning up data from multiple in-house CRM systems, and feeding reliable, consistent customer data into Salesforce.com)

2. They’re combining customer and other forms of data from the cloud-based applications (e.g. Salesforce.com) along with internal CRM applications to create a single version of the truth to enable operational and analytical business processes.

As organizations grow the number of cloud based applications, they have to control the key data that they will use across those applications as well as internal applications and data warehouses. MDM enables organizations to do just that—either for enabling cloud-based applications or creating a single view of the master data across cloud-based applications and internal applications. Thus a strong foundation of MDM will be the key to successfully taking advantage of cloud computing.

FacebookTwitterLinkedInEmailPrintShare
Posted in Master Data Management | Tagged , , , , , , , , , , , , , , | 2 Comments