How Parallel Data Loading and Amazon Redshift Redefine Data Warehousing Performance
As Informatica Cloud product managers, we spend a lot of our time thinking about things like relational databases. Recently, we’ve been considering their limitations, and, specifically, how difficult and expensive it is to provision an on-premise data warehouse to handle the petabytes of fluid data generated by cloud applications and social media. As a result, companies have to often make tradeoffs and decide which data is worth putting into their data warehouse.
Certainly, relational databases have enormous value. They’ve been around for several decades and have served as a bulwark for storing and analyzing structured data. Without them, we wouldn’t be able to extract and store data from on-premise CRM, ERP and HR applications and push it downstream for BI applications to consume.
With the advent of cloud applications and social media however, we are now faced with managing a daily barrage of massive amounts of rapidly changing data, as well as the complexities of analyzing it within the same context as data from on-premise applications. Add to that the stream of data coming from Big Data sources such as Hadoop which then needs to be organized into a structured format so that various correlation analyses can be run by BI applications – and you can begin to understand the enormity of the problem.
Up until now, the only solution has been to throw development resources at legacy on-premise databases, and hope for the best. But given the cost and complexity, this is clearly not a sustainable long-term strategy.
As an alternative, Amazon Redshift, a petabyte-scale data warehouse service in the cloud has the right combination of performance and capabilities to handle the demands of social media and cloud app data, without the additional complexity or expense. Its Massively Parallel Processing (MPP) architecture allows for the lightning fast loading and querying of data. It also features a larger block size, which reduces the number of I/O requests needed to load data, and leads to better performance.
By combining Informatica Cloud with Amazon Redshift’s parallel loading architecture, you can make use of push-down optimization algorithms, which process data transformations in the most optimal source or target database engines. Informatica Cloud also offers native connectivity to cloud and social media apps, such as Salesforce, NetSuite, Workday, LinkedIn, and Twitter, to name a few, which makes it easy to funnel data from these apps into your Amazon Redshift cluster at faster speeds.
If you’re at the Amazon Web Services Summit today in New York City, then you heard our announcement that Informatica Cloud is offering a free 60-day trial for Amazon Redshift with no limitations on the number of rows, jobs, application endpoints, or scheduling. If you’d like to learn more, please visit our Redshift Trial page or go directly to the trial.