Category Archives: Cloud Data Integration
The technology you use in your business can either help or hinder your business objectives.
In the past, slow and manual processes had an inhibiting effect on customer services and sales interactions, thus dragging down the bottom line.
Now, with cloud technology and customers interacting at record speeds, companies expect greater returns from each business outcome. What do I mean when I say business outcome?
Well according to Bluewolf’s State of Salesforce Report, you can split these into four categories: acquisition, expansion, retention and cost reduction.
With the right technology and planning, a business can speedily acquire more customers, expand to new markets, increase customer retention and ensure they are doing all of this efficiently and cost effectively. But what happens when the data or the way you’re interacting with these technologies grow unchecked, and/or becomes corrupted and unreliable.
With data being the new fuel for decision-making, you need to make sure it’s clean, safe and reliable.
With clean data, Salesforce customers, in the above-referenced Bluewolf survey, reported efficiency and productivity gains (66%), improved customer experience (34%), revenue growth (32%) and cost reduction (21%) in 2014.
It’s been said that it costs a business 10X more to acquire new customers than it does to retain existing ones. But, despite the additional cost, real continued growth requires the acquisition of new customers.
Gaining new customers, however, requires a great sales team who knows what and to whom they’re selling. With Salesforce, you have that information at your fingertips, and the chance to let your sales team be as good as they can possibly be.
And this is where having good data fits in and becomes critically important. Because, well, you can have great technology, but it’s only going to be as good as the data you’re feeding it.
The same “garbage in, garbage out” maxim holds true for practically any data-driven or –reliant business process or outcome, whether it’s attracting new customers or building a brand. And with the Salesforce Sales Cloud and Marketing Cloud you have the technology to both attract new customers and build great brands, but if you’re feeding your Clouds with inconsistent and fragmented data, you can’t trust that you’ve made the right investments or decisions in the right places.
The combination of good data and technology can help to answer so many of your critical business questions. How do I target my audience without knowledge of previous successes? What does my ideal customer look like? What did they buy? Why did they buy it?
For better or worse, but mainly better, answering those questions with just your intuition and/or experience is pretty much out of the question. Without the tool to look at, for example, past campaigns and sales, and combining this view to see who your real market is, you’ll never be fully effective.
The same is true for sales. Without the right Leads, and the ability to interact with these Leads effectively, i.e., having the right contact details, company, knowing there’s only one version of that record, can make the discovery process a long and painful one.
But customer acquisition isn’t the only place where data plays a vital role.
When expanding to new markets or upselling and cross selling to existing customers, it’s the data you collect and report on that will help inform where you should focus your efforts.
Knowing what existing relationships you can leverage can make the difference between proactively offering solutions to your customers and losing them to a competitor. With Salesforce’s Analytics Cloud, this visibility that used to take weeks and months to view can now be put together in a matter of minutes. But how do you make strategic decisions on what market to tap into or what relationships to leverage, if you can only see one or two regions? What if you could truly visualize how you interact with your customers? Or see beyond the hairball of interconnected business hierarchies and interactions to know definitively what subsidiary, household or distributor has what? Seeing the connections you have with your customers can help uncover the white space that you could tap into.
Naturally this entire process means nothing if you’re not actually retaining these customers. Again, this is another area that is fuelled by data. Knowing who your customers are, what issues they’re having and what they could want next could help ensure you are always providing your customer with the ultimate experience.
Last, but by no means least, there is cost reduction. Only by ensuring that all of this data is clean — and continuously cleansed — and your Cloud technologies are being fully utilized, can you then help ensure the maximum return on your Cloud investment.
Learn more about how Informatica Cloud can help you maximize your business outcomes through ensuring your data is trusted in the Cloud.
Like me, you probably just returned from an inspiring Sales Kick Off 2015 event. You’ve invested in talented people. You’ve trained them with the skills and knowledge they need to identify, qualify, validate, negotiate and close deals. You’ve invested in world-class applications, like Salesforce Sales Cloud, to empower your sales team to sell more effectively. But does your sales team have what they need to succeed in 2015?
Gartner predicts that as early as next year, companies will compete primarily on the customer experiences they deliver. So, every customer interaction counts. Knowing your customers is key to delivering great sales experiences.
But, inaccurate, inconsistent and disconnected customer information may be holding your sales team back from delivering great sales experiences. If you’re not fueling Salesforce Sales Cloud (or another Sales Force Automation (SFA) application) with clean, consistent and connected customer information, your sales team may be at a disadvantage against the competition.
To successfully compete and deliver great sales experiences more efficiently, your sales team needs a complete picture of their customers. They don’t want to pull information from multiple applications and then reconcile it in spreadsheets. They want direct access to the Total Customer Relationship across channels, touch points and products within their Salesforce Sales Cloud.
Watch this short video comparing a day-in-the-life of two sales reps competing for the same business. One has access to the Total Customer Relationship in Salesforce Sales Cloud, the other does not. Watch now: Salesforce.com with Clean, Consistent and Connected Customer Information
Is your sales team spending time creating spreadsheets by pulling together customer information from multiple applications and then reconciling it to understand the Total Customer Relationship across channels, touch points and products? If so, how much is it costing your business? Or is your sales team engaging with customers without understanding the Total Customer Relationship? How much is that costing your business?
Many innovative sales leaders are gaining a competitive edge by better leveraging their customer data to empower their sales teams to deliver great sales experiences. They are fueling business and analytical applications, like Salesforce Sales Cloud, with clean, consistent and connected customer information. They are arming their sales teams with direct access to richer customer profiles, which includes the Total Customer Relationship across channels, touch points and products.
What measurable results have these sales leaders acheived? Merrill Lynch boosted sales productivity by 15%, resulting in $50M in annual impact. A $60B manufacturing company improved cross-sell and up-sell success by 5%. Logitech increased across channels: online, in their retail partner’s stores and through distribution partners.
This year, I believe more sales leaders will focus on leveraging their customer information for competitive advantage. This will help them shift from sales automation to sales optimization. What do you think?
A friend of mine recently reached out to me about some advice on CRM solutions in the market. Though I have not worked for a CRM vendor, I’ve had both direct experience working for companies that implemented such solutions to my current role interacting with large and small organizations regarding their data requirements to support ongoing application investments across industries. As we spoke, memories started to surface when he and I had worked on implementing Salesforce.com (SFDC) many years ago. Memories that we wanted to forget but important to call out given his new situation.
We worked together for a large mortgage lending software vendor selling loan origination solutions to brokers and small lenders mainly through email and snail mail based marketing. He was responsible for Marketing Operations, and I ran Product Marketing. The company looked at Salesforce.com to help streamline our sales operations and improve how we marketed and serviced our customers. The existing CRM system was from the early 90’s and though it did what the company needed it to do, it was heavily customized, costly to operate, and served its life. It was time to upgrade, to help grow the business, improve business productivity, and enhance customer relationships.
After 90 days of rolling out SFDC, we ran into some old familiar problems across the business. Sales reps continued to struggle in knowing who was a current customer using our software, marketing managers could not create quality mailing lists for prospecting purposes, and call center reps were not able to tell if the person on the other end was a customer or prospect. Everyone wondered why this was happening given we adopted the best CRM solution in the market. You can imagine the heartburn and ulcers we all had after making such a huge investment in our new CRM solution. C-Level executives were questioning our decisions and blaming the applications. The truth was, the issues were not related to SFDC but the data that we had migrated into the system and the lack proper governance and a capable information architecture to support the required data management integration between systems that caused these significant headaches.
During the implementation phase, IT imported our entire customer database of 200K+ unique customer entities from the old system to SFDC. Unfortunately, the mortgage industry was very transient and on average there were roughly 55K licenses mortgage brokers and lenders in the market and because no one ever validated the accuracy of who was really a customer vs. someone who had ever bought out product, we had a serious data quality issues including:
- Trial users who purchased evaluation copies of our products that expired were tagged as current customers
- Duplicate records caused by manual data entry errors consisting of companies with similar but entered slightly differently with the same business address were tagged as unique customers
- Subsidiaries of parent companies in different parts of the country that were tagged again as a unique customer.
- Lastly, we imported the marketing contact database of prospects which were incorrectly accounted for as a customer in the new system
We also failed to integrate real-time purchasing data and information from our procurement systems for sales and support to handle customer requests. Instead of integrating that data in real-time with proper technology, IT had manually loaded these records at the end of the week via FTP resulting in incorrect billing information, statement processing, and a ton of complaints from customers through our call center. The price we paid for not paying attention to our data quality and integration requirements before we rolled out Salesforce.com was significant for a company of our size. For example:
- Marketing got hit pretty hard. Each quarter we mailed evaluation copies of new products to our customer database of 200K, each costing the company $12 per to produce and mail. Total cost = $2.4M annually. Because we had such bad data, we would get 60% of our mailings returned because of invalid addresses or wrong contact information. The cost of bad data to marketing = $1.44M annually.
- Next, Sales struggled miserably when trying to upgrade a customer by running cold call campaigns using the names in the database. As a result, sales productivity dropped by 40% and experienced over 35% sales turnover that year. Within a year of using SFDC, our head of sales got let go. Not good!
- Customer support used SFDC to service customers, our average all times were 40 min per service ticket. We had believed that was “business as usual” until we surveyed what reps were spending their time each day and over 50% said it was dealing with billing issues caused by bad contact information in the CRM system.
At the end of our conversation, this was my advice to my friend:
- Conduct a data quality audit of the systems that would interact with the CRM system. Audit how complete your critical master and reference data is including names, addresses, customer ID, etc.
- Do this before you invest in a new CRM system. You may find that much of the challenges faced with your existing applications may be caused by the data gaps vs. the legacy application.
- If they had a data governance program, involve them in the CRM initiative to ensure they understand what your requirements are and see how they can help.
- However, if you do decide to modernize, collaborate and involve your IT teams, especially between your Application Development teams and your Enterprise Architects to ensure all of the best options are considered to handle your data sharing and migration needs.
- Lastly, consult with your technology partners including your new CRM vendor, they may be working with solution providers to help address these data issues as you are probably not the only one in this situation.
CRM systems have come a long way in today’s Big Data and Cloud Era. Many firms are adopting more flexible solutions offered through the Cloud like Salesforce.com, Microsoft Dynamics, and others. Regardless of how old or new, on premise or in the cloud, companies invest in CRM not to just serve their sales teams or increase marketing conversion rates, but to improve your business relationship with your customers. Period! It’s about ensuring you have data in these systems that is trustworthy, complete, up to date, and actionable to improve customer service and help drive sales of new products and services to increase wallet share. So how to do you maximize your business potential from these critical business applications?
Whether you are adopting your first CRM solution or upgrading an existing one, keep in mind that Customer Relationship Management is a business strategy, not just a software purchase. It’s also about having a sound and capable data management and governance strategy supported by people, processes, and technology to ensure you can:
- Access and migrate data from old to new avoiding develop cost overruns and project delays.
- Identify, detect, and distribute transactional and reference data from existing systems into your front line business application in real-time!
- Manage data quality errors including duplicate records, invalid names and contact information due to proper data governance and proactive data quality monitoring and measurement during and after deployment
- Govern and share authoritative master records of customer, contact, product, and other master data between systems in a trusted manner.
Will your data be ready for your new CRM investments? To learn more:
- Download Salesforce Integration for Dummies
- Download a new Whitepaper on how to Maximize Integration ROI with a Hybrid Approach
- Consolidating Multiple Salesforce Orgs: A Best Practice Guide
- Sign up for a 30 Day Trial of Informatica Cloud Integration
Follow me on Twitter @DataisGR8
It’s true. Data integration is a whole new game, compared to five years ago, or, in some organizations, five minutes ago. The right approaches to data integration continue to evolve around a few principal forces: First, the growth of cloud computing, as pointed out by Stafford. Second, the growing use of big data systems, and the emerging use of data as a strategic asset for the business.
These forces combine to drive us to the understanding that old approaches to data integration won’t provide the value that they once did. As someone who was a CTO of three different data integration companies, I’ve seen these patterns change over the time that I was building technology, and that change has accelerated in the last 7 years.
The core opportunities lie with the enterprise architect, and their ability to drive an understanding of the value of data integration, as well as drive change within their organization. After all, they, or the enterprises CTOs and CIOs (whomever makes decisions about technological approaches), are supposed to drive the organization in the right technical directions that will provide the best support for the business. While most enterprise architects follow the latest hype, such as cloud computing and big data, many have missed the underlying data integration strategies and technologies that will support these changes.
“The integration challenges of cloud adoption alone give architects and developers a once in a lifetime opportunity to retool their skillsets for a long-term, successful career, according to both analysts. With the right skills, they’ll be valued leaders as businesses transition from traditional application architectures, deployment methodologies and sourcing arrangements.”
The problem is that, while most agree that data integration is important, they typically don’t understand what it is, and the value it can bring. These days, many developers live in a world of instant updates. With emerging DevOps approaches and infrastructure, they really don’t get the need, or the mechanisms, required to share data between application or database silos. In many instances, they resort to coding interfaces between source and target systems. This leads to brittle and unreliable integration solutions, and thus hurts and does not help new cloud application and big data deployments.
The message is clear: Those charged with defining technology strategies within enterprises need to also focus on data integration approaches, methods, patterns, and technologies. Failing to do so means that the investments made in new and emerging technology, such as cloud computing and big data, will fail to provide the anticipated value. At the same time, enterprise architects need to be empowered to make such changes. Most enterprises are behind on this effort. Now it’s time to get to work.
I think this new capability, Salesforce Lightning Connect, is an innovative development and gives OData, an OASIS standard, a leg-up on its W3C-defined competitor Linked Data. OData is a REST-based protocol that provides access to data over the web. The fundamental data model is relational and the query language closely resembles what is possible with stripped-down SQL. This is much more familiar to most people than the RDF-based model using by Linked Data or its SPARQL query language.
Standardization of OData has been going on for years (they are working on version 4), but it has suffered from a bit of a chicken-egg problem. Applications haven’t put a large priority on supporting the consumption of OData because there haven’t been enough OData providers, and data providers haven’t prioritized making their data available through OData because there haven’t been enough consumers. With Salesforce, a cloud leader declaring that they will consume OData, the equation changes significantly.
But these things take time – what does someone do who is a user of Salesforce (or any other OData consumer) if most of their data sources they have cannot be accessed as an OData provider? It is the old last-mile problem faced by any communications or integration technology. It is fine to standardize, but how do you get all the existing endpoints to conform to the standard. You need someone to do the labor-intensive work of converting to the standard representation for lots of endpoints.
Informatica has been in the last-mile business for years. As it happens, the canonical model that we always used has been a relational model that lines up very well with the model used by OData. For us to host an OData provider for any of the data sources that we already support, we only needed to do one conversion from the internal format that we’ve always used to the OData standard. This OData provider capability will be available soon.
But there is also the firewall issue. The consumer of the OData has to be able to access the OData provider. So, if you want Salesforce to be able to show data from your Oracle database, you would have to open up a hole in your firewall that provides access to your database. Not many people are interested in doing that – for good reason.
Informatica Cloud’s Vibe secure agent architecture is a solution to the firewall issue that will also work with the new OData provider. The OData provider will be hosted on Informatica’s Cloud servers, but will have access to any installed secure agents. Agents require a one-time install on-premise, but are thereafter managed from the cloud and are automatically kept up-to-date with the latest version by Informatica . An agent doesn’t require a port to be opened, but instead opens up an outbound connection to the Informatica Cloud servers through which all communication occurs. The agent then has access to any on-premise applications or data sources.
OData is especially well suited to reading external data. However, there are better ways for creating or updating external data. One problem is that Salesforce only handles reads, but even when it does handle writes, it isn’t usually appropriate to add data to most applications by just inserting records in tables. Usually a collection of related information must to be provided in order for the update to make sense. To facilitate this, applications provide APIs that provide a higher level of abstraction for updates. Informatica Cloud Application Integration can be used now to read or write data to external applications from with Salesforce through the use of guides that can be displayed from any Salesforce screen. Guides make it easy to generate a friendly user interface that shows exactly the data you want your users to see and to guide them through the collection of new or updated data that needs to be written back to your app.
It takes a village to build mainstream big data solutions. We often get so caught up in Hadoop use cases and customer successes that sometimes we don’t talk enough about the innovative partner technologies and integrations that enable our customers to put the enterprise data hub at the core of their data architecture and innovate with confidence. Cloudera and Informatica have been working together to integrate our products to enable new levels of productivity and lower deployment and production risk.
Going from Hadoop to an enterprise data hub, means a number of things. It means that you recognize the business value of capturing and leveraging all your data for exploration and analytics. It means you’re ready to make the move from Hadoop pilot project to production. And it means your data is important enough that it’s worth securing and making data pipelines visible. It’s the visibility layer, and in particular, the unique integration between Cloudera Navigator and Informatica that I want to focus on in this post.
The era of big data has ushered in increased regulations in a number of industries – banking, retail, healthcare, energy – most of which deal in how data is managed throughout its lifecycle. Cloudera Navigator is the only native end-to-end solution for governance in Hadoop. It provides visibility for analysts to explore data in Hadoop, and enables administrators and managers to maintain a full audit history for HDFS, HBase, Hive, Impala, Spark and Sentry then run reports on data access for auditing and compliance.The integration of Informatica Metadata Manager in the Big Data Edition and Cloudera Navigator extends this level of visibility and governance beyond the enterprise data hub.
Today, only Informatica and Cloudera provide end-to-end data lineage from source systems through Hadoop, and into BI/analytic and data warehouse systems. And you can view it from a single pane within Informatica.
This is important because Hadoop, and the enterprise data hub in particular, doesn’t function in a silo. It’s an integrated part of a larger enterprise-wide data management architecture. The better the insight into where data originated, where it traveled, who had access to it and what they did with it, the greater our ability to report and audit. No other combination of technologies provides this level of audit granularity.
But more so than that, the visibility Cloudera and Informatica provides our joint customers with the ability to confidently stand up an enterprise data hub as a part of their production enterprise infrastructure because they can verify the integrity of the data that undergirds their analytics. I encourage you to check out a demo of the Informatica-Cloudera Navigator integration at this link: http://infa.media/1uBpPbT
You can also check out a demo and learn a little more about Cloudera Navigator and the Informatica integration in the recorded TechTalk hosted by Informatica at this link:
Building an Enterprise Data Hub with proper Data IntegrationData flows into the enterprise from many sources, in many formats, sizes, and levels of complexity. And as enterprise architectures have evolved over the years, traditional data warehouses have become less of a final staging center for data, but rather, one component of the enterprise that interfaces with significant data flows. But since data warehouses should focus on being powerful engines for high value analytics, they should not be the central hub for data movement and data preparation (e.g. ETL/ELT), especially for the newer data types–such as social media, clickstream data, sensor data, internet-of-things-data, etc.–that are in use today.
When you start seeing data warehouse capacity consumed too quickly and performance degradation where end users are complaining about slower response times, and you risk not meeting your service-level agreements, then it might be time to consider an enterprise data hub (EDH). With an EDH, especially one built on Apache™ Hadoop®, you can plan a strategy around data warehouse optimization to get better use out of your entire enterprise architecture.
Of course, whenever you add another new technology to your data center, you care about interoperability. And since many systems in today’s architectures interoperate via data flows, it’s clear that sophisticated data integration technologies will be an important part of your EDH strategy. Today’s big data presents new challenges as relates to a wide variety of data types and formats, and the right technologies are needed to glue all the pieces together, whether those pieces are data warehouses, relational databases, Hadoop, or NoSQL databases.
Choosing a Data Integration Solution
Data integration software, at a high level, has one broad responsibility: to help you process and prepare your data with the right technology. This means it has to get your data to the right place in the right format in a timely manner. So it actually includes many tasks, but the end result is that timely, trusted data can be used for decision-making and risk management throughout the enterprise. You end up with a complete, ready-for-analysis picture of your business, as opposed to segmented snapshots based on a limited data set.
When evaluating a data integration solution for the enterprise, look for:
- Ease of use to boost developer productivity
- A proven track record in the industry
- Widely available technology expertise
- Experience with production deployments with newer technologies like Hadoop
- Ability to reuse data pipelines across different technologies (e.g. data warehouse, RDBMS, Hadoop, and other NoSQL databases)
Data integration is only part of the story. When you’re depending on data to drive business decisions and risk management, you clearly want to ensure the data is reliable. Data governance, data lineage, data quality, and data auditing remain as important topics in an EDH. Oftentimes, data privacy regulatory demands must be met, and the enterprise’s own intellectual property must be protected from accidental exposure.
To help ensure that data is sound and secure, look for a solution that provides:
- Centralized management and control
- Data certification prior to publication, transparent data and integration processes, and the ability to track data lineage
- Granular security, access controls, and data masking to protect data both in transit and at the source to prevent unauthorized access to specific data sets
Informatica is the data integration solution selected by many enterprises. Informatica’s family of enterprise data integration, data quality, and other data management products can manage data — of any format, complexity level, or size –from any business system, and then deliver that data across the enterprise at the desired speed.
Watch the latest Gartner video to see Todd Goldman, Vice President and General Manager for Enterprise Data Integration at Informatica, as well as executives from Cisco and MapR, give their perspective on how businesses today can gain even more value from big data.
With the Winter 2015 Release, Informatica Cloud Advances Real Time and Batch Integration for Citizen Integrators Everywhere
The first of these is in the area of connectivity and brings a whole new set of features and capabilities to those who use our platform to connect with Salesforce, Amazon Redshift, NetSuite and SAP.
Starting with Amazon, the Winter 2015 release leverages the new Redshift Unload Command, giving any user the ability to securely perform bulk queries, and quickly scan and place multiple columns of data in the intended target, without the need for ODBC or JDBC connectors. We are also ensuring the data is encrypted at rest on the S3 bucket while loading data into Redshift tables; this provides an additional layer of security around your data.
For SAP, we’ve added the ability to balance the load across all applications servers. With the new enhancement, we use a Type B connection to route our integration workflows through a SAP messaging server, which then connects with any available SAP application server. Now if an application server goes down, your integration workflows won’t go down with it. Instead, you’ll automatically be connected to the next available application server.
Additionally, we’ve expanded the capability of our SAP connector by adding support for ECC5. While our connector came out of the box with ECC6, ECC5 is still used by a number of our enterprise customers. The expanded support now provides them with the full coverage they and many other larger companies need.
Finally, for Salesforce, we’re updating to the newest versions of their APIs (Version 31) to ensure you have access to the latest features and capabilities. The upgrades are part of an aggressive roadmap strategy, which places updates of connectors to the latest APIs on our development schedule the instant they are announced.
The second major platform enhancement for the Winter 2015 release has to do with our Cloud Mapping Designer and is sure to please those familiar with PowerCenter. With the new release, PowerCenter users can perform secure hybrid data transformations – and sharpen their cloud data warehousing and data analytic skills – through a familiar mapping and design environment and interface.
Specifically, the new enhancement enables you to take a mapplet you’ve built in PowerCenter and bring it directly into the Cloud Mapping Designer, without any additional steps or manipulations. With the PowerCenter mapplets, you can perform multi-group transformations on objects, such as BAPIs. When you access the Mapplet via the Cloud Mapping Designer, the groupings are retained, enabling you to quickly visualize what you need, and navigate and map the fields.
Additional productivity enhancements to the Cloud Mapping Designer extend the lookup and sorting capabilities and give you the ability to upload or delete data automatically based on specific conditions you establish for each target. And with the new feature supporting fully parameterized, unconnected lookups, you’ll have increased flexibility in runtime to do your configurations.
The third and final major Winter release enhancement is to our Real Time capability. Most notable is the addition of three new features that improve the usability and functionality of the Process Designer.
The first of these is a new “Wait” step type. This new feature applies to both processes and guides and enables the user to add a time-based condition to an action within a service or process call step, and indicate how long to wait for a response before performing an action.
When used in combination with the Boundary timer event variation, the Wait step can be added to a service call step or sub-process step to interrupt the process or enable it to continue.
The second is a new select feature in the Process Designer which lets users create their own service connectors. Now when a user is presented with multiple process objects created when the XML or JSON is returned from a service, he or she can select the exact ones to include in the connector.
An additional Generate Process Objects feature automates the creation of objects, thus eliminating the tedious task of replicating hold service responses containing hierarchical XML and JSON data for large structures. These can now be conveniently auto generated when testing a Service Connector, saving integration developers a lot of time.
The final enhancement for the Process Designer makes it simpler to work with XML-based services. The new “Simplified XML” feature for the “Get From” field treats attributes as children, removing the namespaces and making sibling elements into an object list. Now if a user only needs part of the returned XML, they just have to indicate the starting point for the simplified XML.
While those conclude the major enhancements, additional improvements include:
- A JMS Enqueue step is now available to submit an XML or JSON message to a JMS Queue or Topic accessible via the a secure agent.
- Dequeuing (queue and topics) of XML or JSON request payloads is now fully supported.
It’s amazing how fast a year goes by. Last year, Informatica Cloud exhibited at Amazon re:Invent for the very first time where we showcased our connector for Amazon Redshift. At the time, customers were simply kicking the tires on Amazon’s newest cloud data warehousing service, and trying to learn where it might make sense to fit Amazon Redshift into their overall architecture. This year, it was clear that customers had adopted several AWS services and were truly “all-in” on the cloud. In the words of Andy Jassy, Senior VP of Amazon Web Services, “Cloud has become the new normal”.
During Day 1 of the keynote, Andy outlined several areas of growth across the AWS ecosystem such as a 137% YoY increase in data transfer to and from Amazon S3, and a 99% YoY increase in Amazon EC2 instance usage. On Day 2 of the keynote, Werner Vogels, CTO of Amazon made the case that there has never been a better time to build apps on AWS because of all the enterprise-grade features. Several customers came on stage during both keynotes to demonstrate their use of AWS:
- Major League Baseball’s Statcast application consumed 17PB of raw data
- Philips Healthcare used over a petabyte a month
- Intuit revealed their plan to move the rest of their applications to AWS over the next few years
- Johnson & Johnson outlined their use of Amazon’s Virtual Private Cloud (VPC) and referred to their use of hybrid cloud as the “borderless datacenter”
- Omnifone illustrated how AWS has the network bandwidth required to deliver their hi-res audio offerings
- The Weather Company scaled AWS across 4 regions to deliver 15 billion forecast publications a day
Informatica was also mentioned on stage by Andy Jassy as one of the premier ISVs that had built solutions on top of the AWS platform. Indeed, from having one connector in the AWS ecosystem last year (for Amazon Redshift), Informatica has released native connectors for Amazon DynamoDB, Elastic MapReduce (EMR), S3, Kinesis, and RDS.
With so many customers using AWS, it becomes hard for them to track their usage on a more granular level – this is especially true with enterprise companies using AWS because of the multitude of departments and business units using several AWS services. Informatica Cloud and Tableau developed a joint solution which was showcased at the Amazon re:Invent Partner Theater, where it was possible for an IT Operations individual to drill down into several dimensions to find out the answers they need around AWS usage and cost. IT Ops personnel can point out the relevant data points in their data model, such as availability zone, rate, and usage type, to name a few, and use Amazon Redshift as the cloud data warehouse to aggregate this data. Informatica Cloud’s Vibe Integration Packages combined with its native connectivity to Amazon Redshift and S3 allow the data model to be reflected as the correct set of tables in Redshift. Tableau’s robust visualization capabilities then allow users to drill down into the data model to extract whatever insights they require. Look for more to come from Informatica Cloud and Tableau on this joint solution in the upcoming weeks and months.
“The NIH multi-institute awards constitute an initial investment of nearly $32 million in fiscal year 2014 by NIH’s Big Data to Knowledge (BD2K) initiative and will support development of new software, tools and training to improve access to these data and the ability to make new discoveries using them, NIH said in its announcement of the funding.”
The grants will address issues around Big Data adoption, including:
- Locating data and the appropriate software tools to access and analyze the information.
- Lack of data standards, or low adoption of standards across the research community.
- Insufficient polices to facilitate data sharing while protecting privacy.
- Unwillingness to collaborate that limits the data’s usefulness in the research community.
Among the tasks funded is the creation of a “Perturbation Data Coordination and Integration Center.” The center will provide support for data science research that focuses on interpreting and integrating data from different data types and databases. In other words, it will make sure the data moves to where it should move, in order to provide access to information that’s needed by the research scientist. Fundamentally, it’s data integration practices and technologies.
This is very interesting from the standpoint that the movement into big data systems often drives the reevaluation, or even new interest in data integration. As the data becomes strategically important, the need to provide core integration services becomes even more important.
The project at the NIH will be interesting to watch, as it progresses. These are the guys who come up with the new paths to us being healthier and living longer. The use of Big Data provides the researchers with the advantage of having a better understanding of patterns of data, including:
- Patterns of symptoms that lead to the diagnosis of specific diseases and ailments. Doctors may get these data points one at a time. When unstructured or structured data exists, researchers can find correlations, and thus provide better guidelines to physicians who see the patients.
- Patterns of cures that are emerging around specific treatments. The ability to determine what treatments are most effective, by looking at the data holistically.
- Patterns of failure. When the outcomes are less than desirable, what seems to be a common issue that can be identified and resolved?
Of course, the uses of big data technology are limitless, when considering the value of knowledge that can be derived from petabytes of data. However, it’s one thing to have the data, and another to have access to it.
Data integration should always be systemic to all big data strategies, and the NIH clearly understands this to be the case. Thus, they have funded data integration along with the expansion of their big data usage.
Most enterprises will follow much the same path in the next 2 to 5 years. Information provides a strategic advantage to businesses. In the case of the NIH, it’s information that can save lives. Can’t get much more important than that.