Tag Archives: amazon redshift
Amazon Redshift, one of the fast-rising stars in the AWS ecosystem has taken the data warehousing world by storm ever since it was introduced almost two years ago. Amazon Redshift operates completely in the cloud, and allows you to provision nodes on-demand. This model allows you to overcome many of the pains associated with traditional data warehousing techniques, such as provisioning extra server hardware, sizing and preparing databases for loading or extensive SQL scripting.
However, when loading data into Redshift, you may find it challenging to do so in a timely manner. To reduce the time taken to load this data, you may have to spend a tremendous amount of time writing SQL optimization queries which takes away the value proposition of using Redshift in the first place.
Informatica Cloud helps you load this data quickly into Redshift in just a few minutes. To start using Informatica Cloud, you’ll need to establish connections from Redshift and your other data source first. Here are a few easy steps to help you get started with establishing connections from a relational database such as MySQL as well as Redshift into Informatica Cloud:
- Login into your Informatica Cloud account, go to Configure -> Connections, click “New”, and select “MySQL” for “Type”
- Select your Secure Agent and fill in the rest of the database details:
- Test your connection and then click ‘OK’ to save and exit
- Now, login to your AWS account and go to Redshift service page
- Go to your cluster configuration page and make a note of the cluster and cluster database properties: Number of Nodes, Endpoint, Port, Database Name, JDBC URL. You also will need:
- The Redshift database user name and password (which is different from your AWS account)
- AWS account Access Key
- AWS account Secret Key
- Exit the AWS console.
- Now, back in your Informatica Cloud account, go to Configure -> Connections and click “New”.
- Select “AWS Redshift (Informatica)” for “Type” and fill in the rest of the details from the information you have from above
- Test the connection and then click ‘OK’ to save and exit
As you can see, establishing connections was extremely easy and can be done in less than 5 minutes. To learn how customers such as UBM used Informatica Cloud to deliver next-generation customer insights with Amazon Redshift, please join us on September 16 for a webinar where we’ll have product experts from Amazon and UBM explaining how your company can benefit from cloud data warehousing for petabyte-scale analytics using Amazon Redshift.
As Informatica Cloud product managers, we spend a lot of our time thinking about things like relational databases. Recently, we’ve been considering their limitations, and, specifically, how difficult and expensive it is to provision an on-premise data warehouse to handle the petabytes of fluid data generated by cloud applications and social media. As a result, companies have to often make tradeoffs and decide which data is worth putting into their data warehouse.
Certainly, relational databases have enormous value. They’ve been around for several decades and have served as a bulwark for storing and analyzing structured data. Without them, we wouldn’t be able to extract and store data from on-premise CRM, ERP and HR applications and push it downstream for BI applications to consume.
With the advent of cloud applications and social media however, we are now faced with managing a daily barrage of massive amounts of rapidly changing data, as well as the complexities of analyzing it within the same context as data from on-premise applications. Add to that the stream of data coming from Big Data sources such as Hadoop which then needs to be organized into a structured format so that various correlation analyses can be run by BI applications – and you can begin to understand the enormity of the problem.
Up until now, the only solution has been to throw development resources at legacy on-premise databases, and hope for the best. But given the cost and complexity, this is clearly not a sustainable long-term strategy.
As an alternative, Amazon Redshift, a petabyte-scale data warehouse service in the cloud has the right combination of performance and capabilities to handle the demands of social media and cloud app data, without the additional complexity or expense. Its Massively Parallel Processing (MPP) architecture allows for the lightning fast loading and querying of data. It also features a larger block size, which reduces the number of I/O requests needed to load data, and leads to better performance.
By combining Informatica Cloud with Amazon Redshift’s parallel loading architecture, you can make use of push-down optimization algorithms, which process data transformations in the most optimal source or target database engines. Informatica Cloud also offers native connectivity to cloud and social media apps, such as Salesforce, NetSuite, Workday, LinkedIn, and Twitter, to name a few, which makes it easy to funnel data from these apps into your Amazon Redshift cluster at faster speeds.
If you’re at the Amazon Web Services Summit today in New York City, then you heard our announcement that Informatica Cloud is offering a free 60-day trial for Amazon Redshift with no limitations on the number of rows, jobs, application endpoints, or scheduling. If you’d like to learn more, please visit our Redshift Trial page or go directly to the trial.
With practically every on-premise application having a counterpart in the SaaS world, enterprise IT departments have truly made the leap to a new way of computing that is transforming their organizations. The last mile of cloud transformation lies in the field of integration, and it is for this purpose that Informatica had a dedicated Cloud Day this year at Informatica World 2014.
The day kicked off with an introduction by Ronen Schwartz, VP and GM of Informatica Cloud, to the themes of intelligent data integration, comprehensive cloud data management, and cloud process automation. The point was made that with SaaS applications being customized frequently, and the need for more data insights from these apps, it is important to have a single platform that can excel at both batch and real-time integration. A whole series of exciting panel discussions followed, ranging from mission critical Salesforce.com integration, to cloud data warehouses, to hybrid integration use cases involving Informatica PowerCenter and Informatica Cloud.
In the mission critical Salesforce.com integration panel, we had speakers from Intuit, InsideTrack, and Cloud Sherpas. Intuit talked about how they went live with Informatica Cloud in under four weeks, with only two developers on hand. InsideTrack had an interesting use case, wherein, they were using the force.com platform to build a native app that tracked performance of students and the impact of coaching on them. InsideTrack connected to several databases outside the Salesforce platform to perform sophisticated analytics and bring them into their app through the power of Informatica Cloud. Cloud Sherpas, a premier System Integrator, and close partner of both Salesforce.com and Informatica outlined three customer case studies of how they used Informatica Cloud to solve complex integration challenges. The first was a medical devices company that was trying to receive up-to-the-minute price quotes be integrating Salesforce and SAP, the second was a global pharmaceuticals company that was using Salesforce to capture data about their research subjects and needed to synchronize that information with their databases, and the third was Salesforce.com itself.
The die-hard data geeks came out in full force for the cloud data warehousing panel. Accomplished speakers from Microstrategy, Amazon, and The Weather Channel discussed data warehousing using Amazon Web Services. A first-time attendee to this panel would have assumed that cloud data warehousing simply dealt with running relational databases on virtual machines spun up from EC2, but instead participants were in enthralled to learn that Amazon Redshift was a relational database that ran 100% in the cloud. The Weather Channel uses Amazon Redshift to perform analytics on almost 750 million rows of data. Using Informatica Cloud, they can load this data into Redshift in a mere half hour. Microstrategy talked about their cloud analytics initiatives and how they looked at it holistically from a hybrid standpoint.
On that note, it was time for the panel of hybrid integration practitioners to take the stage, with Qualcomm and Conde Nast discussing their use of PowerCenter and Cloud. Qualcomm emphasized that the value of Informatica Cloud was the easy access to a variety of connectors, and that they were using connectors for Salesforce, NetSuite, several relational databases, and web services. Conde Nast mentioned that it was extremely easy to port mappings between PowerCenter and Cloud due to the common code base between the two.
An explosion in mobile devices and social media usage has been the driving force behind large brands using big data solutions for deep, insightful analytics. In fact, a recent mobile consumer survey found that 71% of people used their mobile devices to access social media.
With social media becoming a major avenue for advertising, and mobile devices being the medium of access, there are numerous data points that global brands can cross-reference to get a more complete picture of their consumer, and their buying propensities. Analyzing these multitudes of data points is the reason behind the rise of big data solutions such as Hadoop.
However, Hadoop itself is only one Big Data framework, and consists of several different flavors. Facebook, which called itself the owner of the world’s largest Hadoop cluster, at 100 petabytes, outgrew its capabilities on Hadoop and is looking into a technology which would allow it to abstract its Hadoop workloads across several geographically dispersed datacenters.
When it comes to analytics projects that require intensive data warehousing, there is no one-size fits all answer for Big Data as the use cases can be extremely varied, ranging from short-term to long-term. Deploying Hadoop clusters requires specialized skills and proper capacity planning. In contrast, Big Data solutions in the cloud such as Amazon RedShift allow users to provision database nodes on demand and in a matter of minutes, without the need to take into account large outlays of infrastructure such as servers, and datacenter space. As a result, cloud-based Big Data can be a viable alternative for short-term analytics projects as well as fulfilling sandbox requirements to test out larger Big Data integration projects. Cloud-based Big Data may also make sense in situations where only a subset of the data is required for analysis as opposed to the entire dataset.
With cloud integration, much of the complexity of connecting to data sources and targets is abstracted away. Consequently, when a cloud-based Big Data deployment is combined with a cloud integration solution, it can result in even more time and cost savings and get the projects off the ground much faster.
We’ll be discussing several use cases around cloud-based Big Data in our webinar on August 22nd, Big Data in the Cloud with Informatica Cloud and Amazon Redshift, with special guests from Amazon on the event.