Category Archives: Cloud
I ended my previous blog wondering if awareness of Data Gravity should change our behavior. While Data Gravity adds Value to Big Data, I find that the application of the Value is under explained.
Exponential growth of data has naturally led us to want to categorize it into facts, relationships, entities, etc. This sounds very elementary. While this happens so quickly in our subconscious minds as humans, it takes significant effort to teach this to a machine.
A friend tweeted this to me last week: I paddled out today, now I look like a lobster. Since this tweet, Twitter has inundated my friend and me with promotions from Red Lobster. It is because the machine deconstructed the tweet: paddled <PROPULSION>, today <TIME>, like <PREFERENCE> and lobster <CRUSTACEANS>. While putting these together, the machine decided that the keyword was lobster. You and I both know that my friend was not talking about lobsters.
You may think that this maybe just a funny edge case. You can confuse any computer system if you try hard enough, right? Unfortunately, this isn’t an edge case. 140 characters has not just changed people’s tweets, it has changed how people talk on the web. More and more information is communicated in smaller and smaller amounts of language, and this trend is only going to continue.
When will the machine understand that “I look like a lobster” means I am sunburned?
I believe the reason that there are not hundreds of companies exploiting machine-learning techniques to generate a truly semantic web, is the lack of weighted edges in publicly available ontologies. Keep reading, it will all make sense in about 5 sentences. Lobster and Sunscreen are 7 hops away from each other in dbPedia – way too many to draw any correlation between the two. For that matter, any article in Wikipedia is connected to any other article within about 14 hops, and that’s the extreme. Completed unrelated concepts are often just a few hops from each other.
But by analyzing massive amounts of both written and spoken English text from articles, books, social media, and television, it is possible for a machine to automatically draw a correlation and create a weighted edge between the Lobsters and Sunscreen nodes that effectively short circuits the 7 hops necessary. Many organizations are dumping massive amounts of facts without weights into our repositories of total human knowledge because they are naïvely attempting to categorize everything without realizing that the repositories of human knowledge need to mimic how humans use knowledge.
For example – if you hear the name Babe Ruth, what is the first thing that pops to mind? Roman Catholics from Maryland born in the 1800s or Famous Baseball Player?
If you look in Wikipedia today, he is categorized under 28 categories in Wikipedia, each of them with the same level of attachment. 1895 births | 1948 deaths | American League All-Stars | American League batting champions | American League ERA champions | American League home run champions | American League RBI champions | American people of German descent | American Roman Catholics | Babe Ruth | Baltimore Orioles (IL) players | Baseball players from Maryland | Boston Braves players | Boston Red Sox players | Brooklyn Dodgers coaches | Burials at Gate of Heaven Cemetery | Cancer deaths in New York | Deaths from esophageal cancer | Major League Baseball first base coaches | Major League Baseball left fielders | Major League Baseball pitchers | Major League Baseball players with retired numbers | Major League Baseball right fielders | National Baseball Hall of Fame inductees | New York Yankees players | Providence Grays (minor league) players | Sportspeople from Baltimore | Maryland | Vaudeville performers.
Now imagine how confused a machine would get when the distance of unweighted edges between nodes is used as a scoring mechanism for relevancy.
If I were to design an algorithm that uses weighted edges (on a scale of 1-5, with 5 being the highest), the same search would yield a much more obvious result.
1895 births | 1948 deaths | American League All-Stars | American League batting champions | American League ERA champions | American League home run champions | American League RBI champions | American people of German descent | American Roman Catholics | Babe Ruth | Baltimore Orioles (IL) players | Baseball players from Maryland | Boston Braves players | Boston Red Sox players | Brooklyn Dodgers coaches | Burials at Gate of Heaven Cemetery | Cancer deaths in New York | Deaths from esophageal cancer | Major League Baseball first base coaches | Major League Baseball left fielders | Major League Baseball pitchers | Major League Baseball players with retired numbers | Major League Baseball right fielders | National Baseball Hall of Fame inductees | New York Yankees players | Providence Grays (minor league) players | Sportspeople from Baltimore | Maryland | Vaudeville performers .
Now the machine starts to think more like a human. The above example forces us to ask ourselves the relevancy a.k.a. Value of the response. This is where I think Data Gravity’s becomes relevant.
You can contact me on twitter @bigdatabeat with your comments.
Amazon Web Services and Informatica Deliver Data-Ready Cloud Computing Infrastructure for Every Business
At re:Invent 2014 in Las Vegas, Informatica and AWS announced a broad strategic partnership to deliver data-ready cloud computing infrastructure to any type or size of business.
Informatica’s comprehensive portfolio across Informatica Cloud and PowerCenter solutions connect to multiple AWS Data Services including Amazon Redshift, RDS, DynamoDB, S3, EMR and Kinesis – the broadest pre-built connectivity available to AWS Data Services. Informatica and AWS offerings are pre-integrated, enabling customers to rapidly and cost-effectively implement data warehousing, large scale analytics, lift and shift, and other key use cases in cloud-first and hybrid IT environments. Now, any company can use Informatica’s portfolio to get a plug-and-play on-ramp to the cloud with AWS.
Economical and Flexible Path to the Cloud
As business information needs intensify and data environments become more complex, the combination of AWS and Informatica enables organizations to increase the flexibility and reduce the costs of their information infrastructures through:
- More cost-effective data warehousing and analytics – Customers benefit from lower costs and increased agility when unlocking the value of their data with no on-premise data warehousing/analytics environment to design, deploy and manage.
- Broad, easy connectivity to AWS – Customers gain full flexibility in integrating data from any Informatica-supported data source (the broadest set of sources supported by any integration vendor) through the use of pre-built connectors for AWS.
- Seamless hybrid integration – Hybrid integration scenarios across Informatica PowerCenter and Informatica Cloud data integration deployments are able to connect seamlessly to AWS services.
- Comprehensive use case coverage – Informatica solutions for data integration and warehousing, data archiving, data streaming and big data across cloud and on-premise applications mesh with AWS solutions such as RDS, Redshift, Kinesis, S3, DynamoDB, EMR and other AWS ecosystem services to drive new and rapid value for customers.
New Support for AWS Services
Informatica introduced a number of new Informatica Cloud integrations with AWS services, including connectors for Amazon DynamoDB, Amazon Elastic MapReduce (Amazon EMR) and Amazon Simple Storage Service (Amazon S3), to complement the existing connectors for Amazon Redshift and Amazon Relational Database Service (Amazon RDS).
Additionally, the latest Informatica PowerCenter release for Amazon Elastic Compute Cloud (Amazon EC2) includes support for:
- PowerCenter Standard Edition and Data Quality Standard Edition
- Scaling options – Grid, high availability, pushdown optimization, partitioning
- Connectivity to Amazon RDS and Amazon Redshift
- Domain and repository DB in Amazon RDS for current database PAM (policies and measures)
To learn more, try our 60-day free Informatica Cloud trial for Amazon Redshift.
If you’re in Vegas, please come by our booth at re:Invent, Nov. 11-14, in Booth #1031, Venetian / Sands, Hall.
“The NIH multi-institute awards constitute an initial investment of nearly $32 million in fiscal year 2014 by NIH’s Big Data to Knowledge (BD2K) initiative and will support development of new software, tools and training to improve access to these data and the ability to make new discoveries using them, NIH said in its announcement of the funding.”
The grants will address issues around Big Data adoption, including:
- Locating data and the appropriate software tools to access and analyze the information.
- Lack of data standards, or low adoption of standards across the research community.
- Insufficient polices to facilitate data sharing while protecting privacy.
- Unwillingness to collaborate that limits the data’s usefulness in the research community.
Among the tasks funded is the creation of a “Perturbation Data Coordination and Integration Center.” The center will provide support for data science research that focuses on interpreting and integrating data from different data types and databases. In other words, it will make sure the data moves to where it should move, in order to provide access to information that’s needed by the research scientist. Fundamentally, it’s data integration practices and technologies.
This is very interesting from the standpoint that the movement into big data systems often drives the reevaluation, or even new interest in data integration. As the data becomes strategically important, the need to provide core integration services becomes even more important.
The project at the NIH will be interesting to watch, as it progresses. These are the guys who come up with the new paths to us being healthier and living longer. The use of Big Data provides the researchers with the advantage of having a better understanding of patterns of data, including:
- Patterns of symptoms that lead to the diagnosis of specific diseases and ailments. Doctors may get these data points one at a time. When unstructured or structured data exists, researchers can find correlations, and thus provide better guidelines to physicians who see the patients.
- Patterns of cures that are emerging around specific treatments. The ability to determine what treatments are most effective, by looking at the data holistically.
- Patterns of failure. When the outcomes are less than desirable, what seems to be a common issue that can be identified and resolved?
Of course, the uses of big data technology are limitless, when considering the value of knowledge that can be derived from petabytes of data. However, it’s one thing to have the data, and another to have access to it.
Data integration should always be systemic to all big data strategies, and the NIH clearly understands this to be the case. Thus, they have funded data integration along with the expansion of their big data usage.
Most enterprises will follow much the same path in the next 2 to 5 years. Information provides a strategic advantage to businesses. In the case of the NIH, it’s information that can save lives. Can’t get much more important than that.
This was a great week of excitement and innovation here in San Francisco starting with the San Francisco Giants winning the National League Pennant for the 3rd time in 5 years on the same day Saleforce’s Dreamforce 2014 wrapped up their largest customer conference with over 140K+ attendees from all over the world talking about their new Customer Success Platform.
Salesforce has come a long way from their humble beginnings as the new kid on the cloud front for CRM. The integrated sales, marketing, support, collaboration, application, and analytics as part of the Salesforce Customer Success Platform exemplifies innovation and significant business value upside for various industries however I see it very promising for today’s financial services industry. However like any new business application, the value business gains from it are dependent in having the right data available for the business.
The reality is, SaaS adoption by financial institutions has not been as quick as other industries due to privacy concerns, regulations that govern what data can reside in public infrastructures, ability to customize to fit their business needs, cultural barriers within larger institutions that critical business applications must reside on-premise for control and management purposes, and the challenges of integrating data to and from existing systems with SaaS applications. However, experts are optimistic that the industry may have turned the corner. Gartner (NYSE:IT) asserts more than 60 percent of banks worldwide will process the majority of their transactions in the cloud by 2016. Let’s take a closer look at some of the challenges and what’s required to overcome these obstacles when adopting cloud solutions to power your business.
Challenge #1: Integrating and sharing data between SaaS and on-premise must not be taken lightly
For most banks and insurance companies considering new SaaS based CRM, Marketing, and Support applications with solutions from Salesforce and others must consider the importance of migrating and sharing data between cloud and on-premise applications in their investment decisions. Migrating existing customer, account, and transaction history data is often done by IT staff through the use of custom extracts, scripts, and manual data validations which can carry over invalid information from legacy systems making these new application investments useless in many cases.
For example, customer type descriptions from one or many existing systems may be correct in their respective databases however collapsing them into a common field in the target application seems easy to do. Unfortunately, these transformation rules can be complex and that complexity increases when dealing with tens if not hundreds of applications during the migration and synchronization phase. Having capable solutions to support the testing, development, quality management, validation, and delivery of existing data from old to new is not only good practice, but a proven way of avoiding costly workarounds and business pain in the future.
Challenge 2: Managing and sharing a trusted source of shared business information across the enterprise.
As new SaaS applications are adopted, it is critical to understand how to best govern and synchronize common business information such as customer contact information (e.g. address, phone, email) across the enterprise. Most banks and insurance companies have multiple systems that create and update critical customer contact information, many of them which reside on-premise. For example, insurance customers who update contact information such as a phone number or email address while filing an insurance claim will often result in that claims specialist to enter/update only the claims system given the siloed nature of many traditional banking and insurance companies. This is the power of Master Data Management which is purposely designed to identify changes to master data including customer records in one or many systems, update the customer master record, and share that across other systems that house and require that update is essential for business continuity and success.
In conclusion, SaaS adoption will continue to grow in financial services and across other industries. The silver lining in the cloud is your data and the technology that supports the consumption and distribution of it across the enterprise. Banks and insurance companies investing in new SaaS solutions will operate in a hybrid environment made up of Cloud and core transaction systems that reside on-premise. Cloud adoption will continue to grow and to ensure investments yield value for businesses, it is important to invest in a capable and scalable data integration platform to integrate, govern, and share data in a hybrid eco-system. To learn more on how to deal with these challenges, click here and download a complimentary copy of the new “Salesforce Integration for Dummies”
The security of information systems is a complex, shared responsibility between infrastructure, system and application providers. Informatica doesn’t take lightly the responsibility our customers have entrusted to us in this complex risk equation.
As Informatica’s Chief Information Security Officer, I’d like to share three important security updates with our customers:
- What you need to know about Informatica products and services relative to the latest industry-wide security concern,
- What you need to do to secure Informatica products against the ShellShock vulnerability, and
- How to contact Informatica if you have questions about Informatica product security.
1 – What you need to know
On September 24, 2014 a serious new cluster of vulnerabilities to Linux/Unix distributions was announced, classified as (CVE-2014-6271, CVE-2014-7169, CVE-2014-7186, CVE-2014-7187, CVE-2014-6277 and CVE-2014-6278) aka “Shellshock” or “Bashdoor”. What makes ShellShock so impactful is that it requires relatively low effort or expertise to exploit and gain privileged access to vulnerable systems.
Informatica’s cloud-hosted products, including Informatica Cloud Services (ICS) and our recently-launched Springbok beta, have already been patched to address this issue. We continue to monitor for relevant updates to both vulnerabilities and available patches.
Because this vulnerability is a function of the underlying Operating System, we encourage administrators of potentially vulnerable systems to assess their risk levels and apply patches and/or other appropriate countermeasures.
Informatica’s Information Security team coordinated an internal response with product developers to assess the vulnerability and make recommendations necessary for our on-premise products. Specific products and actions are listed below.
2 – What you need to do
Informatica products themselves require no patches to address the Shellshock vulnerability, they are not directly impacted. However, Informatica strongly recommends that you apply your OS vendors’ patches as they become available, since some applications allow customers to use shell scripts in their pre-and post-processing scripts. Specific Informatica products and remediations are listed below:
|Cloud Service||Version||Patch / Remediation|
|Springbok||Beta||No action necessary. The Springbok infrastructure has been patched by Informatica Cloud Operations.|
|ActiveVOS/Cloud||All||No action necessary. The ActiveVOS/Cloud infrastructure has been patched by Informatica Cloud Operations.|
|Cloud/ICS||All||Customers should apply OS patches to all of their machines running a Cloud agent. Relevant Cloud/ICS hosted infrastructure has already been patched by Informatica Cloud Operations.|
|Product||Version||Patch / Remediation|
|PowerCenter||All||No direct impact. Customers who use shell scripts within their pre- / post-processing steps should apply OS patches to mitigate this vulnerability.|
|IDQ||All||No direct impact. Customers who use shell scripts within their pre- / post-processing steps should apply OS patches to mitigate this vulnerability.|
|MM, BG, IDE||All||No direct impact. Customers who use shell scripts within their pre- / post-processing steps should apply OS patches to mitigate this vulnerability.|
|Data Services / Mercury stack||All|
|PWX mainframe & CDC||All||No direct impact. Recommend customers apply OS patch to all machines with INFA product installed.|
|UM, VDS||All||No direct impact. Recommend customers apply OS patch to all machines with INFA product installed.|
|IDR, IFC||All||No direct impact. Recommend customers apply OS patch to all machines with INFA product installed.|
|B2B DT, UDT, hparser, Atlantic||All||No direct impact. Recommend customers apply OS patch to all machines with INFA product installed.|
|Data Archive||All||No direct impact. Recommend customers apply OS patch to all machines with INFA product installed.|
|Dynamic data masking||All||No direct impact. Recommend customers apply OS patch to all machines with INFA product installed.|
|IDV||All||No direct impact. Recommend customers apply OS patch to all machines with INFA product installed.|
|SAP Nearline||No direct impact. Recommend customers apply OS patch to all machines with INFA product installed..|
|TDM||No direct impact. Recommend customers apply OS patch to all machines with INFA product installed.|
|MDM||All||No direct impact. Recommend customers apply OS patch to all machines with INFA product installed.|
|IR / name3||No direct impact. Recommend customers apply OS patch to all machines with INFA product installed.|
|B2B DX / DIH||All||DX & DIH on Red Hat Customers should apply OS patches. Other OS customers still recommended to apply OS patch.|
|PIM||All||PIM core and Procurement are not not directly impacted. Recommend Media Manager customers apply OS patch to all machines with INFA product installed.|
|ActiveVOS||All||No direct impact for on-premise ActiveVOS product. Cloud-realtime has already been patched.|
|Address Doctor||All||No direct impact for AD services run on Windows. Procurement service has already been patched by Informatica Cloud Operations.|
|StrikeIron||All||No direct impact.|
3 – How to contact Informatica about security
Informatica takes the security of our customers’ data very seriously. Please contact our Informatica’s Knowledge Base (article ID 301574), or our Global Customer Support team if you have any questions or concerns. The Informatica support portal is always available at http://mysupport.informatica.com.
If you are security researcher and have identified a potential vulnerability in an Informatica product or service, please follow our Responsible Disclosure Program.
Bill Burns, VP & Chief Information Security Officer
With Informatica Cloud, we’ve long tracked the growth of the various cloud apps and its adoption in the enterprise. Common business patterns – such as opportunity-to-order, employee onboarding, data migration and business intelligence – that once took place solely on-premises are now being conducted both in the cloud and on-premises.
The fact is that we are well on our way to a world where our business needs are best met by a mix of on-premises and cloud applications. Regardless of what we do or make, we can no longer get away with just on-premises applications – or at least not for long. As we become more reliant on cloud services, such as those offered by Oracle, Salesforce, SAP, NetSuite, Workday, we are embracing the reality of a new hybrid world, and the imperative for simpler integration it demands.
So, as the ground shifts beneath us, moving us toward the hybrid world, we, as business and IT users, are left standing with a choice: Continue to seek solutions in our existing on-premises integration stacks, or go beyond, to find them with the newer and simpler cloud solution. Let us briefly look at five business patterns we’ve been tracking.
One of the first things we’ve noticed with the hybrid environment is the incredible frequency with which data is moved back and forth between the on-premises and cloud environments. We call this the data integration pattern, and it is best represented by getting data, such as price list or inventory from Oracle E-Business into a cloud app so that the actual user of the cloud app can view the most updated information. Here the data (usually master data) is copied toserves a certain purpose. Data Integration also involves the typical needs of data to be transformed before it can be inserted or updated. The understanding of metadata and data models of the involved applications is key to do this effectively and repeatedly.
The second is the application integration pattern, or the real time transaction flow between your on-premises and cloud environment, where you have business processes and services that need to communicate with one another. Here, the data needs to be referenced in real time for a knowledge worker to take action.
The third, data warehousing in the cloud, is an emerging pattern that is gaining importance for both mid- and large-size companies. In this pattern, businesses are moving massive amounts of data in bulk from both on-premises and cloud sources into a cloud data warehouse, such as Amazon Redshift, for BI analysis.
The fourth, the Internet of Things (IOT) pattern, is also emerging and is becoming more important, especially as new technologies and products, such as Nest, enable us to push streaming data (sensor data, web logs, etc.) and combine them with other cloud and on-premises data sources into a cloud data store. Often the data is unstructured and hence it is critical for an integration platform to effectively deal with unstructured data.
The fifth and final pattern, API integration, is gaining prominence in the cloud. Here, an on-premise or cloud application exposes the data or service as an external API that can be consumed directly by applications or by a higher-level composite app in an orchestration.
While there are certainly different approaches to the challenges brought by Hybrid IT, cloud integration is often best-suited to solving them.
First, while the integration problems are more or less similar to the on-premise world, the patterns now overlap between cloud and on-premise. Second, integration responsibility is now picked up at the edge, closer to the users, whom we call “citizen integrators”. Third, time to market and agility demands that any integration platform you work with can live up to your expectations of speed. There are no longer multiyear integration initiatives in the era of the cloud. Finally, the same values that made cloud application adoption attractive (such as time-to-value, manageability, low operational overhead) also apply to cloud integration.
One of the most important forces driving cloud adoption is the need for companies to put more power into hands of the business user. These users often need to access data in other systems and they are quite comfortable going through the motions of doing so without actually being aware that they are performing integration. We call this class of users ‘Citizen Integrators’. For example, if a user uploads an excel file to Salesforce, it’s not something they would call as “integration”. It is an out-of-the-box action that is integrated with their user experience and is simple to use from a tooling point of view and oftentimes native within the application they are working with.
Cloud Integration Convergence is driving many integration use cases. The most common integration – such as employee onboarding – can span multiple integration patterns. It involves data integration, application integration and often data warehousing for business intelligence. If we agree that doing this in the cloud makes sense, the question is whether you need three different integration stacks in the cloud for each integration pattern. And even if you have three different stacks, what if an integration flow involves the comingling of multiple patterns? What we are noticing is a single Cloud Integration platform to address more and more of these use cases and also providing the tooling for both a Citizen Integrator as well as an experienced Integration Developer.
The bottom line is that in the new hybrid world we are seeing a convergence, where the industry is moving towards streamlined and lighter weight solutions that can handle multiple patterns with one platform.
The concept of Cloud Integration Convergence is an important one and we have built its imperatives into our products. With our cloud integration platform, we combine the ability to handle any integration pattern with an easy-to-use interface that empowers citizen integrators, and frees integration developers for more rigorous projects. And because we’re Informatica, we’ve designed it to work in tandem with PowerCenter, which means anything you’ve developed for PowerCenter can be leveraged for Informatica Cloud and vice versa thereby fulfilling Informatica’s promise of Map Once, Deploy Anywhere.
In closing, I invite you to visit us at the Informatica booth at Oracle Open World in booth #3512 in Moscone West. I’ll be there with some of my colleagues, and we would be happy to meet and talk with you about your experiences and challenges with the new Hybrid IT world.
Amazon Redshift, one of the fast-rising stars in the AWS ecosystem has taken the data warehousing world by storm ever since it was introduced almost two years ago. Amazon Redshift operates completely in the cloud, and allows you to provision nodes on-demand. This model allows you to overcome many of the pains associated with traditional data warehousing techniques, such as provisioning extra server hardware, sizing and preparing databases for loading or extensive SQL scripting.
However, when loading data into Redshift, you may find it challenging to do so in a timely manner. To reduce the time taken to load this data, you may have to spend a tremendous amount of time writing SQL optimization queries which takes away the value proposition of using Redshift in the first place.
Informatica Cloud helps you load this data quickly into Redshift in just a few minutes. To start using Informatica Cloud, you’ll need to establish connections from Redshift and your other data source first. Here are a few easy steps to help you get started with establishing connections from a relational database such as MySQL as well as Redshift into Informatica Cloud:
- Login into your Informatica Cloud account, go to Configure -> Connections, click “New”, and select “MySQL” for “Type”
- Select your Secure Agent and fill in the rest of the database details:
- Test your connection and then click ‘OK’ to save and exit
- Now, login to your AWS account and go to Redshift service page
- Go to your cluster configuration page and make a note of the cluster and cluster database properties: Number of Nodes, Endpoint, Port, Database Name, JDBC URL. You also will need:
- The Redshift database user name and password (which is different from your AWS account)
- AWS account Access Key
- AWS account Secret Key
- Exit the AWS console.
- Now, back in your Informatica Cloud account, go to Configure -> Connections and click “New”.
- Select “AWS Redshift (Informatica)” for “Type” and fill in the rest of the details from the information you have from above
- Test the connection and then click ‘OK’ to save and exit
As you can see, establishing connections was extremely easy and can be done in less than 5 minutes. To learn how customers such as UBM used Informatica Cloud to deliver next-generation customer insights with Amazon Redshift, please join us on September 16 for a webinar where we’ll have product experts from Amazon and UBM explaining how your company can benefit from cloud data warehousing for petabyte-scale analytics using Amazon Redshift.
What I love about the cloud is it has something of value to offer practically any government organization, regardless of size, maturity, point of view, approach. Even for the most conservative IT shops, there are use cases that just plain make sense. And with the growing availability of FEDRAMP certified offerings, it’s becoming easier to procure. But, thinking realistically, for reasons of law, budget, time, architecture, we know the cloud will not be the solution for every public sector problem. Some applications, some data will never leave your agency’s premises. And here in lies the new complexity. You have applications and data on-prem. You have applications and data in the cloud. And you have business requirements that require these apps to work together, to share data.
So, now that you have a hybrid environment, what can you do about? Let’s face it, we can talk about technology, architecture and approaches all day long, but, it always comes down to this, what should be done with the data. You need answers to questions such as; Is it safe? Is it accessible? It is reliable? How do I know if the integrity has been compromised? What about the quality? How error-prone is the data? How complete is the data? How do we manage it across this new hybrid landscape? How can I get data from a public cloud application to my on-prem data warehouse? How can I leverage the flexibility of public IaaS to build a new application that will need access to data that is also required for an on-prem legacy application?
I know many government IT professional are wrestling with these questions and seeking solutions. So, here’s an interesting thought. Most of these questions are not exactly new, they are just taking on the added context of the cloud. Prior to the cloud, many agencies discovered answers in the form of a data integration platform. The platform is used to ensure every application, every user has access to the data they need to perform their mission or job. I think of it this way. The platform is a “standardized” abstraction layer that ensures all your data gets to where it needs to be, when it needs to be there, in the form it needs to be in. There are hundreds of government IT shops using such an approach.
Here’s the good news. This approach to integrating data can be extended to include the cloud. Imagine placing “agents” in all the places where your data needs to live, the agents capable of communicating with each other to integrate, alter or move data. Now add to this the idea of a cloud-based remote control that allows you to control all the functions of the agents. Using such a platform now enables your agency to tie on-prem systems to cloud systems, minimizing the effect of having multiple silos of information. Now government workers and warfighters will have the ability to more quickly get complete, accurate data, regardless of where it originates and citizens will benefit from more effectively delivered services.
How would such an approach change your ideas on how to leverage the cloud for your agency? If you live near the Washington, DC area, you may wish to drop in on the Government Cloud Computing and Data Center Conference & Expo. One of my colleagues, Ronen Schwartz will be discussing this topic. For those not in the vicinity, you can learn more here.
Earlier this month, CNBC.com published its first ever R&D All-Stars: CNBC RQ 50, ranking the top 50 public companies by return on research and development investment. Coming in the top ten, and the first pure software play was Informatica, mentioned as first among great software companies like Google, Amazon, and Salesforce. CNBC.com is referencing a companion article by David Spiegel – Boring stocks that generate R&D heat-and profits. The article made an excellent point: When R&D productivity links R&D spending to corporate revenue growth and market value, it is a better gauge of the productivity of that spending.
Unlike other R&D lists or rankings, the RQ50 was less concerned with pure dollars than what the company actually did with it. The RQ50 measures increase in revenue as it relates to increase in R&D expenditures. Its methodology was provided by Professor Anne Marie Knott, of Washington University in St. Louis, who tracks and studies corporate R&D investment, and has found that the companies that regularly turn R&D into income typically place innovation at the forefront of the corporate mission and have a structure and culture that support it.
Informatica is on the list because its revenue gains between 2006 and 2013 correlate directly with its increased R&D investment over the same period. While the list specifically cites the 2013 figures, the result is due to a systematic and long-term strategic initiative to place innovation at the core of our business plan.
Informatica has innovated broadly across its product spectrum. I can personally speak to one area where it has invested smartly and made significant gains – Informatica Cloud. Informatica decided to make its initial investment in the cloud in 2006 and was early in the market with regards to cloud integration. In fact, back in 2006, very few of today’s well-known SaaS companies were even publicly traded. The most popular SaaS app today, Salesforce.com had revenues of just $309 million in FY2006 compared with over $4 billion in FY2014. Amazon EC2, one of the core services of Amazon Web Services (AWS) itself had only been announced in that year. Apart from EC2, Amazon only had six other services in 2006. In 2014, that number has ballooned to over 30.
In his article about the RQ50, Spiegel talks about how the companies on the list aren’t just listening to what customers want or need now. They’re also challenging themselves to come up with the things the market can use two or ten years into the future. In 2006, Informatica took the same approach with its initial investment in cloud integration.
For us, it started with an observation and then a commitment to the belief that we were at an inflection point with the cloud, and on the cusp of what was going to become a true megatrend that represented a huge opportunity for the integration industry. Informatica assembled a small, agile group made up of strong leaders with varying skills and experience pulled from different areas—sales, engineering, and product management — throughout the company. It also meant throwing away the traditional measures of success and identifying new and more appropriate metrics to benchmark our progress. And finally, it included partnering with like-minded companies like Salesforce and NetSuite initially, and later on with Amazon, and taking our core strength – on-premise data integration technology – and pivoting it into a new direction.
The result was the first iteration of the Informatica Cloud. It leveraged the fruit of our R&D investment – the Vibe Virtual Data Machine – to provide SaaS administrators and line of business IT with the ability to perform lightweight cloud integrations between their on-premise and cloud applications without the involvement of an integration developer. Subsequent work and innovation have continued along the same path, adding tools like drag-and-drop design interfaces and mapping wizards, with the end goal of giving line-of-business (LOB) IT, cloud application administrators and citizen integrators a single platform to perform all the integration patterns they require, on their timeline. Informatica Cloud has consistently delivered 2-3 releases every year, and is now already on Release 20. From originally starting out with Data Replication for Salesforce, the Cloud team added bigger and better functionality such as developing connectivity for over 100 applications and data protocols, opening up our integration services through REST APIs, going beyond integration by incorporating cloud master data management and cloud test data management capabilities, and most recently announcing optimized batch and real-time cloud integration under a single unified platform.
And it goes on to this day, with investments in new innovations and directions, like Informatica Project Springbok. With Project Springbok, we’re duplicating what we did with Informatica Cloud but this time for citizen integrators. We’re using our vast experiences working with customers and building cutting-edge technology IP over the last 20 years and enabling citizen integrators to harmonize data faster for better insights (and hopefully, less late nights writing spreadsheet formulas). What we do after Project Springbok is anyone’s guess, but wherever that is, it will be sure to put us on lists like the RQ 50 for some time to come.