Category Archives: Cloud Data Management
I ended my previous blog wondering if awareness of Data Gravity should change our behavior. While Data Gravity adds Value to Big Data, I find that the application of the Value is under explained.
Exponential growth of data has naturally led us to want to categorize it into facts, relationships, entities, etc. This sounds very elementary. While this happens so quickly in our subconscious minds as humans, it takes significant effort to teach this to a machine.
A friend tweeted this to me last week: I paddled out today, now I look like a lobster. Since this tweet, Twitter has inundated my friend and me with promotions from Red Lobster. It is because the machine deconstructed the tweet: paddled <PROPULSION>, today <TIME>, like <PREFERENCE> and lobster <CRUSTACEANS>. While putting these together, the machine decided that the keyword was lobster. You and I both know that my friend was not talking about lobsters.
You may think that this maybe just a funny edge case. You can confuse any computer system if you try hard enough, right? Unfortunately, this isn’t an edge case. 140 characters has not just changed people’s tweets, it has changed how people talk on the web. More and more information is communicated in smaller and smaller amounts of language, and this trend is only going to continue.
When will the machine understand that “I look like a lobster” means I am sunburned?
I believe the reason that there are not hundreds of companies exploiting machine-learning techniques to generate a truly semantic web, is the lack of weighted edges in publicly available ontologies. Keep reading, it will all make sense in about 5 sentences. Lobster and Sunscreen are 7 hops away from each other in dbPedia – way too many to draw any correlation between the two. For that matter, any article in Wikipedia is connected to any other article within about 14 hops, and that’s the extreme. Completed unrelated concepts are often just a few hops from each other.
But by analyzing massive amounts of both written and spoken English text from articles, books, social media, and television, it is possible for a machine to automatically draw a correlation and create a weighted edge between the Lobsters and Sunscreen nodes that effectively short circuits the 7 hops necessary. Many organizations are dumping massive amounts of facts without weights into our repositories of total human knowledge because they are naïvely attempting to categorize everything without realizing that the repositories of human knowledge need to mimic how humans use knowledge.
For example – if you hear the name Babe Ruth, what is the first thing that pops to mind? Roman Catholics from Maryland born in the 1800s or Famous Baseball Player?
If you look in Wikipedia today, he is categorized under 28 categories in Wikipedia, each of them with the same level of attachment. 1895 births | 1948 deaths | American League All-Stars | American League batting champions | American League ERA champions | American League home run champions | American League RBI champions | American people of German descent | American Roman Catholics | Babe Ruth | Baltimore Orioles (IL) players | Baseball players from Maryland | Boston Braves players | Boston Red Sox players | Brooklyn Dodgers coaches | Burials at Gate of Heaven Cemetery | Cancer deaths in New York | Deaths from esophageal cancer | Major League Baseball first base coaches | Major League Baseball left fielders | Major League Baseball pitchers | Major League Baseball players with retired numbers | Major League Baseball right fielders | National Baseball Hall of Fame inductees | New York Yankees players | Providence Grays (minor league) players | Sportspeople from Baltimore | Maryland | Vaudeville performers.
Now imagine how confused a machine would get when the distance of unweighted edges between nodes is used as a scoring mechanism for relevancy.
If I were to design an algorithm that uses weighted edges (on a scale of 1-5, with 5 being the highest), the same search would yield a much more obvious result.
1895 births | 1948 deaths | American League All-Stars | American League batting champions | American League ERA champions | American League home run champions | American League RBI champions | American people of German descent | American Roman Catholics | Babe Ruth | Baltimore Orioles (IL) players | Baseball players from Maryland | Boston Braves players | Boston Red Sox players | Brooklyn Dodgers coaches | Burials at Gate of Heaven Cemetery | Cancer deaths in New York | Deaths from esophageal cancer | Major League Baseball first base coaches | Major League Baseball left fielders | Major League Baseball pitchers | Major League Baseball players with retired numbers | Major League Baseball right fielders | National Baseball Hall of Fame inductees | New York Yankees players | Providence Grays (minor league) players | Sportspeople from Baltimore | Maryland | Vaudeville performers .
Now the machine starts to think more like a human. The above example forces us to ask ourselves the relevancy a.k.a. Value of the response. This is where I think Data Gravity’s becomes relevant.
You can contact me on twitter @bigdatabeat with your comments.
What I love about the cloud is it has something of value to offer practically any government organization, regardless of size, maturity, point of view, approach. Even for the most conservative IT shops, there are use cases that just plain make sense. And with the growing availability of FEDRAMP certified offerings, it’s becoming easier to procure. But, thinking realistically, for reasons of law, budget, time, architecture, we know the cloud will not be the solution for every public sector problem. Some applications, some data will never leave your agency’s premises. And here in lies the new complexity. You have applications and data on-prem. You have applications and data in the cloud. And you have business requirements that require these apps to work together, to share data.
So, now that you have a hybrid environment, what can you do about? Let’s face it, we can talk about technology, architecture and approaches all day long, but, it always comes down to this, what should be done with the data. You need answers to questions such as; Is it safe? Is it accessible? It is reliable? How do I know if the integrity has been compromised? What about the quality? How error-prone is the data? How complete is the data? How do we manage it across this new hybrid landscape? How can I get data from a public cloud application to my on-prem data warehouse? How can I leverage the flexibility of public IaaS to build a new application that will need access to data that is also required for an on-prem legacy application?
I know many government IT professional are wrestling with these questions and seeking solutions. So, here’s an interesting thought. Most of these questions are not exactly new, they are just taking on the added context of the cloud. Prior to the cloud, many agencies discovered answers in the form of a data integration platform. The platform is used to ensure every application, every user has access to the data they need to perform their mission or job. I think of it this way. The platform is a “standardized” abstraction layer that ensures all your data gets to where it needs to be, when it needs to be there, in the form it needs to be in. There are hundreds of government IT shops using such an approach.
Here’s the good news. This approach to integrating data can be extended to include the cloud. Imagine placing “agents” in all the places where your data needs to live, the agents capable of communicating with each other to integrate, alter or move data. Now add to this the idea of a cloud-based remote control that allows you to control all the functions of the agents. Using such a platform now enables your agency to tie on-prem systems to cloud systems, minimizing the effect of having multiple silos of information. Now government workers and warfighters will have the ability to more quickly get complete, accurate data, regardless of where it originates and citizens will benefit from more effectively delivered services.
How would such an approach change your ideas on how to leverage the cloud for your agency? If you live near the Washington, DC area, you may wish to drop in on the Government Cloud Computing and Data Center Conference & Expo. One of my colleagues, Ronen Schwartz will be discussing this topic. For those not in the vicinity, you can learn more here.
SAP’s Jam social platform has generated a great deal of buzz since its release last May – and for good reason. As detailed by Alan Lepofsky in his coverage for ZDNet, the new Jam reboot included the kind of things, such as out-of-the-box integration, workflow templates and simplified developer tools, that make both IT and business users very happy.
However, the complexity of SAP’s Business Suite (or ECC as it’s called) does not easily lend itself to integration with other applications. The code underlying it is built on a proprietary language called ABAP – a combination of COBOL and Open SQL with some object-oriented features– that requires specialized knowledge and skill not easily found outside of the SAP ecosystem. Up until now, the typical integration project required the involvement of a specialized SAP consultant to develop custom ABAP code or map complex BAPI/IDoc structures as well as a BASIS administrator to transport the ABAP code from development to QA to production. The result was expensive and manually intensive. Integration projects took a few months or longer to complete and were not agile enough to handle ongoing requirements or even field changes.
Today, Informatica Cloud offers business a more innovative approach to SAP data extraction – ultimately, promoting agile development and enabling rapid deployment with the following three important features.
Automatically Generating ABAP Code
At the core of Informatica’s solution is the Cloud Connector for SAP. While the face of the Connector is a simple, wizard-based, drag-and-drop interface, under the hood it uses a Remote Function Call (RFC) to dynamically generate ABAP code (based on user choices) to connect with SAP and access the data through the application layer.
Drag and Drop Design Palette
The wizard guides the user through the steps necessary to extract the data from SAP and send it to any application where it is needed. Using Informatica Cloud’s drag-and-drop design palette, one can simply choose SAP – like any other application endpoint – and select what is needed to connect to the target, without ever having write specialized code.
Because of the dynamically generated ABAP code, SaaS application administrators trying to connect to SAP don’t have to deal with the SAP transports (and the lengthy development cycle) for each extract, reducing the time from months to weeks for an individual project, as well as the load on the BASIS administrators. The increased agility enables end users to respond to business demands by acquiring related data extracts, field and feature changes and/or additions – in near real time. Informatica also reduces the load on both the admin and the server – even further by eliminating the need for transports and sending the data in packets, and by running the extracts in the background. And since no data is staged or buffered on the SAP server, there is never a risk of compromising the system’s or online users’ performance.
Speedy Development Through Vibe Integration Packages
Informatica’s solution also includes a technology bundle to speed up development time and reduce the user’s learning curve. The bundle, or Informatica Vibe integration package, consists of downloadable templates that help the user to understand and use the complex SAP interfaces. Future roadmap releases will contain resources for additional SAS endpoints.
In his review mentioned above in the opening, Lepofsky notes the importance of integration and the partner ecosystem to the Jam platform. The same can be said of the SAP Business Suite and the specialized LOB cloud apps that orbit it. Without ready and real-time access to SAP’s data, even the most feature-rich app is of little use to anyone. With Informatica’s Cloud Connector for SAP, business users, like Informatica customer Addivant, now have a simple and efficient way to solve the most pressing problems presented by SAP to cloud app integration.
The mainstream use of SaaS applications as part of the cloud strategies in many enterprises continues to rise. Initially led by LOB IT (lines of business, apps IT), SaaS deployments now have central IT (such as Integration Competency Centers) personnel extensively involved. This shift stems from the need to develop strategies around hybrid application deployments – environments that include integrations between cloud and on-premise applications.
The entire breadth of cloud-to-cloud and cloud-to-ground integration scenarios necessitates interacting with the publicly available APIs, cloud services, and internal web services. The end goal is to enable secure, consistent data access on enterprise apps wherein any cloud, or on-premise application is accessible through a tablet or smartphone, in an intuitive, easy-to-use interface.
A key necessity for hybrid application deployments is the concept of “adaptive integration” within any integration platform-as-a-service (iPaaS). Any cloud service integration that claims to have iPaaS capabilities needs to have integration features that connect data, applications, and processes, as well as have governance and API management functionality. The iPaaS must also run on a multi-tenant infrastructure and be available on-premise at times.
You can learn more about adaptive integration, how the iPaaS impacts it, and hybrid application strategies in our recorded webinar, Enabling Hybrid Application Strategies through Cloud Service Integration, featuring Gartner Vice-President and Fellow, Massimo Pezzini, and Informatica Senior Vice-President of Data Integration, Ash Kulkarni. Key topics covered will include:
- How SaaS adoption is driving the need for hybrid integration
- Why the mobilization of the enterprise means a stricter criteria for an iPaaS
- How Everton Football Club in the English Premier League gained major customer insights by using Informatica Cloud
- What “Adaptive Integration” and the Internet of Things have in store for us
Now in its third year (2012, 2013), The State of Salesforce Annual Review continues to be the most comprehensive report on the Salesforce ecosystem. Based on the data from over 1,000 global Salesforce users, this report highlights how companies are using the Salesforce platform, where resources are being allocated, and where industry hype meets reality. Over the past three years, the report has evolved much like the technology, shifting and transforming to address recent advancements, and well as tracking longitudinal trends in the space.
We’ve found that key integration partners like Informatica Cloud continue to grow in importance within the Salesforce ecosystem. Beyond the core platform offerings from Salesforce, third-party apps and integration technologies have received considerable attention as companies look to extend the value of their initial investments and unite systems. The need to sync multiple platforms and applications is an emerging need in the Salesforce ecosystem—which will be highlighted in the 2014 report.
As Salesforce usage expands, so does our approach to survey execution. In line with this evolution, here’s what we’ve learned over the last three years from data collection:
Functions, Departments Make a Difference
Sales, Marketing, IT, and Service all have their own needs and pain points. As Salesforce moves quickly across the enterprise, we want to recognize the values, priorities, and investments by each department. Not only are the primary clouds for each function at different stages of maturity, but the ways in which each department uses their cloud are unique. We anticipate discovery of how enterprises are collaborating across functions and clouds.
Focus on Region
As our international data set continues to grow we are investing in regionalized reports for the US, UK, France, and Australia. While we saw indications of differences between each region in last year’s survey, they were not statistically significant.
Customer Engagement is a Top Priority
Everyone agrees that customer engagement is important, but what are companies actually doing about it? A section on predictive analytics and questions about engagement specific to departments has been included in this year’s survey. We suspect that the recent trend of companies empowering employees with a combination of data and mobile will be validated in the survey results.
Variation Across Industries
As an added bonus, we will build a report targeting specific insights from the Financial Services industry.
We Need Your Help
Our dataset depends on input from Salesforce users spanning all functions, roles, industries, and regions. Every response matters. Please take 15 minutes to share your Salesforce experiences, and you will receive a personalized report, comparing your responses to the aggregate survey results.
As Informatica Cloud product managers, we spend a lot of our time thinking about things like relational databases. Recently, we’ve been considering their limitations, and, specifically, how difficult and expensive it is to provision an on-premise data warehouse to handle the petabytes of fluid data generated by cloud applications and social media. As a result, companies have to often make tradeoffs and decide which data is worth putting into their data warehouse.
Certainly, relational databases have enormous value. They’ve been around for several decades and have served as a bulwark for storing and analyzing structured data. Without them, we wouldn’t be able to extract and store data from on-premise CRM, ERP and HR applications and push it downstream for BI applications to consume.
With the advent of cloud applications and social media however, we are now faced with managing a daily barrage of massive amounts of rapidly changing data, as well as the complexities of analyzing it within the same context as data from on-premise applications. Add to that the stream of data coming from Big Data sources such as Hadoop which then needs to be organized into a structured format so that various correlation analyses can be run by BI applications – and you can begin to understand the enormity of the problem.
Up until now, the only solution has been to throw development resources at legacy on-premise databases, and hope for the best. But given the cost and complexity, this is clearly not a sustainable long-term strategy.
As an alternative, Amazon Redshift, a petabyte-scale data warehouse service in the cloud has the right combination of performance and capabilities to handle the demands of social media and cloud app data, without the additional complexity or expense. Its Massively Parallel Processing (MPP) architecture allows for the lightning fast loading and querying of data. It also features a larger block size, which reduces the number of I/O requests needed to load data, and leads to better performance.
By combining Informatica Cloud with Amazon Redshift’s parallel loading architecture, you can make use of push-down optimization algorithms, which process data transformations in the most optimal source or target database engines. Informatica Cloud also offers native connectivity to cloud and social media apps, such as Salesforce, NetSuite, Workday, LinkedIn, and Twitter, to name a few, which makes it easy to funnel data from these apps into your Amazon Redshift cluster at faster speeds.
If you’re at the Amazon Web Services Summit today in New York City, then you heard our announcement that Informatica Cloud is offering a free 60-day trial for Amazon Redshift with no limitations on the number of rows, jobs, application endpoints, or scheduling. If you’d like to learn more, please visit our Redshift Trial page or go directly to the trial.