Tag Archives: amazon redshift
Informatica’s Redshift connector is a state-of-the-art Bulk-Load type connector which allows users to perform all CRUD operations on Amazon Redshift. It makes use of AWS best practices to load data at high throughput in a safe and secure manner and is available on Informatica Cloud and PowerCenter.
Today we are excited to announce the support of Amazon’s newly launched custom JDBC and ODBC drivers for Redshift. Both the drivers are certified for Linux and Windows environments.
Informatica’s Redshift connector will package the JDBC 4.1 driver which further enhances our meta-data fetch capabilities for tables and views in Redshift. That improves our overall design-time responsiveness by over 25%. It also allows us to query multiple tables/views and retrieve the result-set using primary and foreign key relationships.
Amazon’s ODBC driver enhances our FULL Push Down Optimization capabilities on Redshift. Some of the key differentiating factors are support for the SYSDATE variable, functions such as ADD_TO_DATE(), ASCII(), CONCAT(), LENGTH(), TO_DATE(), VARIANCE() etc. which weren’t possible before.
Amazon’s ODBC driver is not pre-packaged but can be directly downloaded from Amazon’s S3 store.
Once installed, the user can change the default ODBC System DSN in ODBC Data Source Administrator.
It’s amazing how fast a year goes by. Last year, Informatica Cloud exhibited at Amazon re:Invent for the very first time where we showcased our connector for Amazon Redshift. At the time, customers were simply kicking the tires on Amazon’s newest cloud data warehousing service, and trying to learn where it might make sense to fit Amazon Redshift into their overall architecture. This year, it was clear that customers had adopted several AWS services and were truly “all-in” on the cloud. In the words of Andy Jassy, Senior VP of Amazon Web Services, “Cloud has become the new normal”.
During Day 1 of the keynote, Andy outlined several areas of growth across the AWS ecosystem such as a 137% YoY increase in data transfer to and from Amazon S3, and a 99% YoY increase in Amazon EC2 instance usage. On Day 2 of the keynote, Werner Vogels, CTO of Amazon made the case that there has never been a better time to build apps on AWS because of all the enterprise-grade features. Several customers came on stage during both keynotes to demonstrate their use of AWS:
- Major League Baseball’s Statcast application consumed 17PB of raw data
- Philips Healthcare used over a petabyte a month
- Intuit revealed their plan to move the rest of their applications to AWS over the next few years
- Johnson & Johnson outlined their use of Amazon’s Virtual Private Cloud (VPC) and referred to their use of hybrid cloud as the “borderless datacenter”
- Omnifone illustrated how AWS has the network bandwidth required to deliver their hi-res audio offerings
- The Weather Company scaled AWS across 4 regions to deliver 15 billion forecast publications a day
Informatica was also mentioned on stage by Andy Jassy as one of the premier ISVs that had built solutions on top of the AWS platform. Indeed, from having one connector in the AWS ecosystem last year (for Amazon Redshift), Informatica has released native connectors for Amazon DynamoDB, Elastic MapReduce (EMR), S3, Kinesis, and RDS.
With so many customers using AWS, it becomes hard for them to track their usage on a more granular level – this is especially true with enterprise companies using AWS because of the multitude of departments and business units using several AWS services. Informatica Cloud and Tableau developed a joint solution which was showcased at the Amazon re:Invent Partner Theater, where it was possible for an IT Operations individual to drill down into several dimensions to find out the answers they need around AWS usage and cost. IT Ops personnel can point out the relevant data points in their data model, such as availability zone, rate, and usage type, to name a few, and use Amazon Redshift as the cloud data warehouse to aggregate this data. Informatica Cloud’s Vibe Integration Packages combined with its native connectivity to Amazon Redshift and S3 allow the data model to be reflected as the correct set of tables in Redshift. Tableau’s robust visualization capabilities then allow users to drill down into the data model to extract whatever insights they require. Look for more to come from Informatica Cloud and Tableau on this joint solution in the upcoming weeks and months.
Amazon Redshift, one of the fast-rising stars in the AWS ecosystem has taken the data warehousing world by storm ever since it was introduced almost two years ago. Amazon Redshift operates completely in the cloud, and allows you to provision nodes on-demand. This model allows you to overcome many of the pains associated with traditional data warehousing techniques, such as provisioning extra server hardware, sizing and preparing databases for loading or extensive SQL scripting.
However, when loading data into Redshift, you may find it challenging to do so in a timely manner. To reduce the time taken to load this data, you may have to spend a tremendous amount of time writing SQL optimization queries which takes away the value proposition of using Redshift in the first place.
Informatica Cloud helps you load this data quickly into Redshift in just a few minutes. To start using Informatica Cloud, you’ll need to establish connections from Redshift and your other data source first. Here are a few easy steps to help you get started with establishing connections from a relational database such as MySQL as well as Redshift into Informatica Cloud:
- Login into your Informatica Cloud account, go to Configure -> Connections, click “New”, and select “MySQL” for “Type”
- Select your Secure Agent and fill in the rest of the database details:
- Test your connection and then click ‘OK’ to save and exit
- Now, login to your AWS account and go to Redshift service page
- Go to your cluster configuration page and make a note of the cluster and cluster database properties: Number of Nodes, Endpoint, Port, Database Name, JDBC URL. You also will need:
- The Redshift database user name and password (which is different from your AWS account)
- AWS account Access Key
- AWS account Secret Key
- Exit the AWS console.
- Now, back in your Informatica Cloud account, go to Configure -> Connections and click “New”.
- Select “AWS Redshift (Informatica)” for “Type” and fill in the rest of the details from the information you have from above
- Test the connection and then click ‘OK’ to save and exit
As you can see, establishing connections was extremely easy and can be done in less than 5 minutes. To learn how customers such as UBM used Informatica Cloud to deliver next-generation customer insights with Amazon Redshift, please join us on September 16 for a webinar where we’ll have product experts from Amazon and UBM explaining how your company can benefit from cloud data warehousing for petabyte-scale analytics using Amazon Redshift.
As Informatica Cloud product managers, we spend a lot of our time thinking about things like relational databases. Recently, we’ve been considering their limitations, and, specifically, how difficult and expensive it is to provision an on-premise data warehouse to handle the petabytes of fluid data generated by cloud applications and social media. As a result, companies have to often make tradeoffs and decide which data is worth putting into their data warehouse.
Certainly, relational databases have enormous value. They’ve been around for several decades and have served as a bulwark for storing and analyzing structured data. Without them, we wouldn’t be able to extract and store data from on-premise CRM, ERP and HR applications and push it downstream for BI applications to consume.
With the advent of cloud applications and social media however, we are now faced with managing a daily barrage of massive amounts of rapidly changing data, as well as the complexities of analyzing it within the same context as data from on-premise applications. Add to that the stream of data coming from Big Data sources such as Hadoop which then needs to be organized into a structured format so that various correlation analyses can be run by BI applications – and you can begin to understand the enormity of the problem.
Up until now, the only solution has been to throw development resources at legacy on-premise databases, and hope for the best. But given the cost and complexity, this is clearly not a sustainable long-term strategy.
As an alternative, Amazon Redshift, a petabyte-scale data warehouse service in the cloud has the right combination of performance and capabilities to handle the demands of social media and cloud app data, without the additional complexity or expense. Its Massively Parallel Processing (MPP) architecture allows for the lightning fast loading and querying of data. It also features a larger block size, which reduces the number of I/O requests needed to load data, and leads to better performance.
By combining Informatica Cloud with Amazon Redshift’s parallel loading architecture, you can make use of push-down optimization algorithms, which process data transformations in the most optimal source or target database engines. Informatica Cloud also offers native connectivity to cloud and social media apps, such as Salesforce, NetSuite, Workday, LinkedIn, and Twitter, to name a few, which makes it easy to funnel data from these apps into your Amazon Redshift cluster at faster speeds.
If you’re at the Amazon Web Services Summit today in New York City, then you heard our announcement that Informatica Cloud is offering a free 60-day trial for Amazon Redshift with no limitations on the number of rows, jobs, application endpoints, or scheduling. If you’d like to learn more, please visit our Redshift Trial page or go directly to the trial.
With practically every on-premise application having a counterpart in the SaaS world, enterprise IT departments have truly made the leap to a new way of computing that is transforming their organizations. The last mile of cloud transformation lies in the field of integration, and it is for this purpose that Informatica had a dedicated Cloud Day this year at Informatica World 2014.
The day kicked off with an introduction by Ronen Schwartz, VP and GM of Informatica Cloud, to the themes of intelligent data integration, comprehensive cloud data management, and cloud process automation. The point was made that with SaaS applications being customized frequently, and the need for more data insights from these apps, it is important to have a single platform that can excel at both batch and real-time integration. A whole series of exciting panel discussions followed, ranging from mission critical Salesforce.com integration, to cloud data warehouses, to hybrid integration use cases involving Informatica PowerCenter and Informatica Cloud.
In the mission critical Salesforce.com integration panel, we had speakers from Intuit, InsideTrack, and Cloud Sherpas. Intuit talked about how they went live with Informatica Cloud in under four weeks, with only two developers on hand. InsideTrack had an interesting use case, wherein, they were using the force.com platform to build a native app that tracked performance of students and the impact of coaching on them. InsideTrack connected to several databases outside the Salesforce platform to perform sophisticated analytics and bring them into their app through the power of Informatica Cloud. Cloud Sherpas, a premier System Integrator, and close partner of both Salesforce.com and Informatica outlined three customer case studies of how they used Informatica Cloud to solve complex integration challenges. The first was a medical devices company that was trying to receive up-to-the-minute price quotes be integrating Salesforce and SAP, the second was a global pharmaceuticals company that was using Salesforce to capture data about their research subjects and needed to synchronize that information with their databases, and the third was Salesforce.com itself.
The die-hard data geeks came out in full force for the cloud data warehousing panel. Accomplished speakers from Microstrategy, Amazon, and The Weather Channel discussed data warehousing using Amazon Web Services. A first-time attendee to this panel would have assumed that cloud data warehousing simply dealt with running relational databases on virtual machines spun up from EC2, but instead participants were in enthralled to learn that Amazon Redshift was a relational database that ran 100% in the cloud. The Weather Channel uses Amazon Redshift to perform analytics on almost 750 million rows of data. Using Informatica Cloud, they can load this data into Redshift in a mere half hour. Microstrategy talked about their cloud analytics initiatives and how they looked at it holistically from a hybrid standpoint.
On that note, it was time for the panel of hybrid integration practitioners to take the stage, with Qualcomm and Conde Nast discussing their use of PowerCenter and Cloud. Qualcomm emphasized that the value of Informatica Cloud was the easy access to a variety of connectors, and that they were using connectors for Salesforce, NetSuite, several relational databases, and web services. Conde Nast mentioned that it was extremely easy to port mappings between PowerCenter and Cloud due to the common code base between the two.
An explosion in mobile devices and social media usage has been the driving force behind large brands using big data solutions for deep, insightful analytics. In fact, a recent mobile consumer survey found that 71% of people used their mobile devices to access social media.
With social media becoming a major avenue for advertising, and mobile devices being the medium of access, there are numerous data points that global brands can cross-reference to get a more complete picture of their consumer, and their buying propensities. Analyzing these multitudes of data points is the reason behind the rise of big data solutions such as Hadoop.
However, Hadoop itself is only one Big Data framework, and consists of several different flavors. Facebook, which called itself the owner of the world’s largest Hadoop cluster, at 100 petabytes, outgrew its capabilities on Hadoop and is looking into a technology which would allow it to abstract its Hadoop workloads across several geographically dispersed datacenters.
When it comes to analytics projects that require intensive data warehousing, there is no one-size fits all answer for Big Data as the use cases can be extremely varied, ranging from short-term to long-term. Deploying Hadoop clusters requires specialized skills and proper capacity planning. In contrast, Big Data solutions in the cloud such as Amazon RedShift allow users to provision database nodes on demand and in a matter of minutes, without the need to take into account large outlays of infrastructure such as servers, and datacenter space. As a result, cloud-based Big Data can be a viable alternative for short-term analytics projects as well as fulfilling sandbox requirements to test out larger Big Data integration projects. Cloud-based Big Data may also make sense in situations where only a subset of the data is required for analysis as opposed to the entire dataset.
With cloud integration, much of the complexity of connecting to data sources and targets is abstracted away. Consequently, when a cloud-based Big Data deployment is combined with a cloud integration solution, it can result in even more time and cost savings and get the projects off the ground much faster.
We’ll be discussing several use cases around cloud-based Big Data in our webinar on August 22nd, Big Data in the Cloud with Informatica Cloud and Amazon Redshift, with special guests from Amazon on the event.