Category Archives: Cloud Computing
As a Tesla owner, I recently had the experience of calling Tesla service after a yellow warning message appeared on the center console of my car.” Check tire pressure system. Call Tesla Service.” While still on the freeway, I voice dialed Tesla with my iPhone and was in touch with a service representative within minutes.
|Me: A yellow warning message just appeared on my dash and also the center console.
Tesla rep: Yes, I see – is it the tire pressure warning?
Me: Yes – do I need to pull into a gas station? I haven’t had to visit a gas station since I purchased the car.
Tesla rep: Well, I also see that you are traveling on a freeway that has some steep elevation – it’s possible the higher altitude is affecting your car’s tires temporarily until the pressure equalizes. Let me check your tire pressure monitoring sensor in a half hour. If the sensor still detects a problem, I will call you and give further instructions.
As it turned out, the warning message disappeared after ten minutes and everything was fine for the rest of the trip. However, the episode served as a reminder that the world will be much different with the advent of the Internet of Things. Just as humans connected with mobile phones become more productive, machines and devices connected to the network become more useful. In this case, a connected automobile allowed the remote service rep to remotely access vehicle data, read the tire pressure sensor as well as the vehicle location/elevation and was able to suggest a course of action. This example is fairly basic compared to the opportunities afforded by networked devices/machines.
In addition to remote servicing, there are several other use case categories that offer great potential, including:
- Preventative Maintenance – monitor usage data and increase the overall uptime for machines/devices while decreasing the cost of upkeep. e.g., Tesla runs remote diagnostics on vehicles and has the ability to identify vehicle problems before they occur.
- Realtime Product Enhancements – analyze product usage data and deliver improvements quickly in response. e.g., Tesla delivers software updates that improve the usability of the vehicle based on analysis of owner usage.
- Higher Efficiency in Business Operations – analyze consolidated enterprise transaction data with machine data to identify opportunities to achieve greater operational efficiency. e.g., Tesla deployed waves of new fast charging stations (known as superchargers) based upon analyzing the travel patterns of its vehicle owners.
- Differentiated Product/Service Offerings – deliver new class of applications that operate on correlated data across a broad spectrum of sources (HINT for Tesla: a trip planning application that estimates energy consumption and recommends charging stops would be really cool…)
In each case, machine data is integrated with other data (traditional enterprise data, vehicle owner registration data, etc.) to create business value. Just as important to the connectivity of the devices and machines is the ability to integrate the data. Several Informatica customers have begun investing in M2M (aka Internet of Things) infrastructure and Informatica technology has been critical to their efforts. US Xpress utilizes mobile censors on its vast fleet of trucks and Informatica delivers the ability to consolidate, cleanse and integrate the data they collect.
My recent episode with Tesla service was a simple, yet eye-opening experience. With increasingly more machines and devices getting wireless connected and the ability to integrate the tremendous volumes of data being generated, this example is only a small hint of more interesting things to come.
Many Salesforce developers that use sandbox environments for test and development suffer from the following challenges:
- Lack of relevant data for proper testing and development (empty sandboxes)
- To fix that problem, they manually copy data from production
- Which results in exposing sensitive data to unauthorized users
- And potentially consuming more storage than allocated for sandbox environments (resulting in unexpected costs)
To address these challenges, Informatica just released Cloud Test Data Management for Salesforce. This solution is designed to give Salesforce admins and developers the ability to provision secure test data subsets to developers through an easy to use, wizard driven approach. The application is delivered as a service through a subscription-based pricing model.
The Informatica IT team uses Salesforce internally and validated an ROI based on reducing the amount of developer time used to manually script copying data from production to a sandbox, reducing the amount of time fixing defects due to not having the right test data, and eliminating the risk of a data breach by masking sensitive data.
To learn more about this new offering, watch a demonstration that shows how to create secure test data subsets for Salesforce. Also, available now, try the free Cloud Data Masking app or take a 30-day Cloud Test Data Management trial.
An explosion in mobile devices and social media usage has been the driving force behind large brands using big data solutions for deep, insightful analytics. In fact, a recent mobile consumer survey found that 71% of people used their mobile devices to access social media.
With social media becoming a major avenue for advertising, and mobile devices being the medium of access, there are numerous data points that global brands can cross-reference to get a more complete picture of their consumer, and their buying propensities. Analyzing these multitudes of data points is the reason behind the rise of big data solutions such as Hadoop.
However, Hadoop itself is only one Big Data framework, and consists of several different flavors. Facebook, which called itself the owner of the world’s largest Hadoop cluster, at 100 petabytes, outgrew its capabilities on Hadoop and is looking into a technology which would allow it to abstract its Hadoop workloads across several geographically dispersed datacenters.
When it comes to analytics projects that require intensive data warehousing, there is no one-size fits all answer for Big Data as the use cases can be extremely varied, ranging from short-term to long-term. Deploying Hadoop clusters requires specialized skills and proper capacity planning. In contrast, Big Data solutions in the cloud such as Amazon RedShift allow users to provision database nodes on demand and in a matter of minutes, without the need to take into account large outlays of infrastructure such as servers, and datacenter space. As a result, cloud-based Big Data can be a viable alternative for short-term analytics projects as well as fulfilling sandbox requirements to test out larger Big Data integration projects. Cloud-based Big Data may also make sense in situations where only a subset of the data is required for analysis as opposed to the entire dataset.
With cloud integration, much of the complexity of connecting to data sources and targets is abstracted away. Consequently, when a cloud-based Big Data deployment is combined with a cloud integration solution, it can result in even more time and cost savings and get the projects off the ground much faster.
We’ll be discussing several use cases around cloud-based Big Data in our webinar on August 22nd, Big Data in the Cloud with Informatica Cloud and Amazon Redshift, with special guests from Amazon on the event.
I’m astounded by the incredible turnout and response to MDM Day and other MDM-related events at Informatica World, and again, I see this as a sign of MDM’s importance in the business world. Attendees told their stories, swapped best-practices, and shared their visions of using MDM to improve up-sell, cross-sell, and other important business metrics. But now let’s keep the momentum going. Here I want to tell you about three free webinars that will help you to dive more deeply into MDM, and take your initiatives to the next level. The first is for any large organization, and the other two are for pharmaceutical companies. (more…)
In my last blog post on the Vibe Virtual Data Machine (VDM), I wrote about the history of Vibe. Now I will cover a little more, at a high level, on what is in the Vibe Virtual Data Machine as well as a little bit of information on how it works.
The Informatica Vibe virtual data machine is a data management engine that knows how to ingest data and then very efficiently transform, cleanse, manage, or combine it with other data. It is the core engine that drives the Informatica Platform. You can’t buy the Vibe VDM standalone, it comes with every version of Informatica PowerCenter as well as other products like our federation services, PowerCenter Big Data Edition for Hadoop, Informatica Data Quality as well as the Informatica Cloud products.
The Vibe VDM works by receiving a set of instructions that describe the data source(s) from which it will extract data, the rules and flow by which that data will be transformed, analyzed, masked, archived, matched, or cleansed, and ultimately where that data will be loaded when the processing is finished.
The instructions set is generated by creating a graphical mapping of the data flow as well as the transformation and data cleansing logic that is part of that flow. The graphical instructions are then converted into code that Vibe then interprets as its instruction set. One other important thing to know about Vibe is that it is most often run as a standalone engine running on Linux, Unix or Windows. However, it also runs directly on Hadoop and when it is used as part of the Informatica Cloud set of products, it is a key component of the on premise agent that is controlled and managed by the Informatica Cloud.
Lastly the Vibe VDM is available for deployment as an SDK that can be embedded into an application. So instead of moving data to a data integration engine for processing, you can move the engine to the data. This concept of embedding a VDM into an application is the same idea as building an application on an application server. One way to think about Vibe is like a very use case specific application server specifically built for handling the data integration and quality aspects of an application.
Vibe consists of a number of fundamental components (see Figure below):
Transformation Library: This is a collection of useful, prebuilt transformations that the engine calls to combine, transform, cleanse, match, and mask data. For those familiar with PowerCenter or Informatica Data Quality, this library is represented by the icons that the developer can drag and drop onto the canvas to perform actions on data.
Optimizer: The Optimizer compiles data processing logic into internal representation to ensure effective resource usage and efficient run time based on data characteristics and execution environment configurations.
Executor: This is a run-time execution engine that orchestrates the data logic using the appropriate transformations. The engine reads/writes data from an adapter or directly streams the data from an application. The executor can physically move data or can present results via data virtualization.
Connectors: Informatica’s connectivity extensions provide data access from various data sources. This is what allows Informatica Platform users to connect to almost any data source or application for use by a variety of data movement technologies and modes, including batch, request/response, and publish/subscribe.
Vibe Software Development Kit (SDK): While not shown in the diagram above, Vibe provides APIs and extensions that allow third parties to add new connectors as well as transformations. So developers are not limited
Hopefully this brief overview helps you understand a little more about what Vibe is all about. If you have questions, post them below and either I or one of the Informatica team members will respond so you can understand how Vibe is going to energize the data integration industry.
Data is everywhere. It’s in databases and applications spread across your enterprise. It’s in the hands of your customers and partners. It’s in cloud applications and cloud servers. It’s on spreadsheets and documents on your employee’s laptops and tablets. It’s in smartphones, sensors and GPS devices. It’s in the blogosphere, the twittersphere and your friends’ Facebook timelines. (more…)
Hosting Big Data applications in the cloud has compelling advantages. Scale doesn’t become as overwhelming an issue as it is within on-premise systems. IT will no longer feel compelled to throw more disks at burgeoning storage requirements, and performance becomes the contractual obligation of someone else outside the organization.
Cloud may help clear up some of the costlier and thornier problems of attempting to manage Big Data environments, but it also creates some new issues. As Ron Exler of Saugatuck Technology recently pointed out in a new report, cloud-based solutions “can be quickly configured to address some big data business needs, enabling outsourcing and potentially faster implementations.” However, he adds, employing the cloud also brings some risks as well.
Data security is one major risk area, and I could write many posts on this. But management issues also present other challenges. Too many organizations see cloud as an cure-all for their application and data management ills, but broken processes are never fixed when new technology is applied to them. There are also plenty of risks with the misappropriation of big data, and the cloud won’t make these risks go away. Exler lists some of the risks that stem from over-reliance on cloud technology, from the late delivery of business reports to the delivery of incorrect business information, resulting in decisions based on incorrect source data. Sound familiar? The gremlins that have haunted data analytic and management for years simply won’t disappear behind a cloud.
Exler makes three recommendations for moving big data into cloud environments – note that the solutions he proposes have nothing to do with technology, and everything to do with management:
1) Analyze the growth trajectory of your data and your business. Typically, organizations will have a lot of different moving parts and interfaces. And, as the business grows and changes, it will be constantly adding new data sources. As Exler notes, “processing integration or hand off points in such piecemeal approaches represent high risk to data in the chain of possession – from collection points to raw data to data edits to data combination to data warehouse to analytics engine to viewing applications on multiple platforms.” Business growth and future requirements should be analyzed and modeled to make sure cloud engagements will be able “to provide adequate system performance, availability, and scalability to account for the projected business expansion,” he states.
2) Address data quality issues as close to the source as possible. Because both cloud and big data environments have so many moving parts, “finding the source of a data problem can be a significant challenge,” Exler warns. “Finding problems upstream in the data flow prevent time-consuming and expensive reprocessing that could be needed should errors be discovered downstream.” Such quality issues have a substantial business cost as well. When data errors are found, it becomes “an expensive company-wide fire drill to correct the data,” he says.
3) Build your project management, teamwork and communication skills. Because big data and cloud projects involve so many people and components from across the enterprise, requiring coordination and interaction between various specialists, subject matter experts, vendors, and outsourcing partners. “This coordination is not simple,” Exler warns. “Each group involved likely has different sets of terminology, work habits, communications methods, and documentation standards. Each group also has different priorities; oftentimes such new projects are delegated to lower priority for supporting groups.” Project managers must be leaders and understand the value of open and regular communications.
In a recent Information Management blog post, Alex Bakker from Saugatuck Technology noted:
“There is an underlying problem facing many, if not most enterprise IT leaders and organizations: these technologies [Cloud, Mobile, Social, Analytics and Integration] have developed much faster than enterprise IT groups and practices have been able to adopt and manage them.”
The good news is a topic that was once considered the “Achilles heel of cloud computing” is increasingly being recognized as the key enabler of cloud success: integration and data management. With that in mind, Informatica rolled out our Summer 2013 release this week. Here are some highlights and useful resources. (more…)
In Ashwin Viswanath’s previous video blog, he spoke about why it is important to have a cloud integration solution that has purpose-built integration applications. In this video, he delves deeper into the security aspects of cloud integration and how to rapidly provision integration environments for distributed business units, subsidiaries and departments in a quick and efficient manner.
According to Doug Henschen, Executive Editor at InformationWeek, “Despite the weak economy and zero growth in many IT salary categories, business intelligence (BI), analytics, information-integration and data warehousing professionals are seeing a slow-but-steady rise in income.” (more…)