Category Archives: Cloud Computing
Once upon a time, database schema changes were rare and handled with scrutiny. The stability of source data led to the development of the traditional Data Integration model. In this traditional model, a developer pulled a fixed number of source fields into an integration, transformed these fields, and then mapped the data into appropriate target fields.
The world of data has profoundly changed. Today’s Cloud applications allow an administrator to add custom fields to an object at a moment’s notice. Because source data is increasingly malleable, the traditional Data Integration model is no longer optimal. The Data Integration model must evolve.
Today’s integrations must dynamically adapt to ever-changing environments. (Webinar HERE)
To meet these demands, Informatica has built the Informatica Cloud Mapping Designer. The Mapping Designer provides power and adaptability to integrations through the “link rules” and “incoming field rules” features. Integration developers no longer need to deal with fields on a one-by-one basis. Cloud Designer allows the integration developer to specify a set of dynamic “rules” that tell the mapping how fields need to be handled.
For example, the default rule is “Include all fields”, which is both simple and powerful. The “all fields” rule dynamically resolves to bring in as many fields as exist at the source at run time. Regardless of how many new fields the application developer or database administrator may have thrown in to the source after the integration was developed, this simple rule can bring in all the new fields into the integration dynamically. This exponentially increases developer productivity, as the integration developer is not making modifications just to keep up with changes to the integration endpoints. Instead, the integration is “future proofed”.
Link rules can be defined in combination using both “includes” and “excludes” criteria. The rules can be of four types:
- Include or Exclude All fields
- Include or Exclude Fields of a particular datatype (example: String, numeric, decimal, datetime, blob etc)
- Include or Exclude Fields that fit a name pattern (example: any field that ends with “_c” or any field that starts with “Shipping_”)
- Include or Exclude Fields by a particular name (example: “Id”, “Name” etc)
Any combination of the link rules can be put together to create sophisticated dynamic rules for fields to flow.
Each transformation in the integration can specify the set of rules that determine what fields flow into that particular transformation. For example, if I need all custom fields from a Salesforce source to flow into a target, I would simply “Include fields by name pattern : suffixed with ‘_c’” – which is the naming convention for custom field names in Salesforce. In another example, If I need to perform standardization of date formats for all datetime fields in an expression, I can define a rule to “Include fields by datatype – datetime”.
The dynamic nature of the link rules is what empowers a mapping created in Informatica Cloud Designer to be easily converted into a highly reusable integration template through parameterization.
For example, the entire source object can be parameterized and the integration developer may focus on the core integration logic without having to worry about individual fields. For example I can build an integration for bringing data into a slowly changing dimension table in a datawarehouse and this integration can apply to any source object. When the integration is executed by substituting different source objects for the source parameter, the integration would work as expected since the logical rules can dynamically bring in the fields regardless of what the source object structure is. Now all of a sudden, an integration developer is only required to build one reusable integration template for replicating multiple objects to the datawarehouse and NOT dozens or even hundreds of such repeated integration mappings. Needless to say, maintenance is hugely optimized.
With the power of logically defining field propagation through an integration combined with the ability to parameterize just about any part of the integration logic, the Cloud Mapping Designer provides a unique and powerful platform for developing reusable end to end integration solutions (such as Opportunity to Order, Accounts load to Salesforce, SAP product catalog to Salesforce, File load to Amazon redshift etc). Such prebuilt end-to-end solutions or VIPs (Vibe Integration Packages) can be easily customized by any consuming customer to adapt to their unique environments and business needs by tweaking only certain configurations but largely reusing the core integration logic.
What could be better than building integrations… building far fewer integrations that are reusable and self-adapting
To learn more, join the upcoming Cloud Spring release Webinar on Thursday, March 13.
With the growing prominence of big data as both a strategic and tactical resource for enterprises, there’s been a growing shift in the scope of business intelligence. Not too long ago, BI’s world was in tools that ran on individual workstations or PCs, providing filtered reports on limited sets of data, or stacking the data into analytical cubes.
Now, BI encompasses a range of data and analytics from across the enterprise, and is increasingly likely to be online, supported in the cloud, as it is in a local PC. However, as it has been for years, BI adoption still tends to be limited, not reaching its full potential. In a recent interview, BI analyst Cindi Howson, asks the question, what’s holding companies back from achieving a big impact with BI? In a recent Q&A with TDWI’s Linda Briggs, she discussed the issues raised in her new book, Successful Business Intelligence: Unlock the Value of BI and Big Data.
The success of BI depends, more than anything, on one factor, she says: corporate culture. Some organizations have achieved an analytic culture that reaches across their various business limes, but for many, it’s a challenge. “Leadership means not just the CIO but also the CEO, the lines of business, the COO, and the VP of marketing,” says Howson. “Culture and leadership are closely related, and it’s hard to separate one from the other.”
While corporate culture has always been important to success, it take on even a more critical role in efforts to compete on analytics. For example, she illustrates, “companies have a lot of data, and certainly they value the data, but there is sometimes a fear of sharing it. Once you start exposing the data, somebody’s job might be on the line, or it can show that someone made some bad decisions. Maybe the data will reveal that you’ve spent millions of dollars and you’re not really getting the returns that you thought you would in pursuing a particular market segment or product.”
It’s important to see an analytics culture as focusing on data as a tool to see problems and make course corrections, or act on opportunities – not to punish or expose individuals or departments.
Another point of corporate resistance is employing BI in the cloud, a challenge recently explored by Brad Peters, CEO of Birst. Here again, corporate culture may hold back efforts to move to the cloud, which offers greater scalability and availability for BI and analytics initiatives. In a recent interview in Diginomica, he says that IT departments, for example, may throw up roadblocks, for fear of being disintermediated. Plus, there is also a recognition that once BI data is in the cloud, it often gets “harder to work with.” Multi-tenant sites, for example, have security systems and protocols that may limit users’ ability to manipulate or parse the data.
The increasing adoption of cloud-based services – such as those from Amazon or Salesforce – are gradually melting resistance to the idea of cloud-based BI, Peters adds. He particular;y sees advantages for geographically-dispersed workforces.”
For his part, he admits that “has never been under any illusion that the shift of enterprise analytics to the cloud was going to happen overnight.”
Leo Eweani makes the case that the data tsunami is coming. “Businesses are scrambling to respond and spending accordingly. Demand for data analysts is up by 92%; 25% of IT budgets are spent on the data integration projects required to access the value locked up in this data “ore” – it certainly seems that enterprise is doing The Right Thing – but is it?”
Data is exploding within most enterprises. However, most enterprises have no clue how to manage this data effectively. While you would think that an investment in data integration would be an area of focus, many enterprises don’t have a great track record in making data integration work. “Scratch the surface, and it emerges that 83% of IT staff expect there to be no ROI at all on data integration projects and that they are notorious for being late, over-budget and incredibly risky.”
The core message from me is that enterprises need to ‘up their game’ when it comes to data integration. This recommendation is based upon the amount of data growth we’ve already experienced, and will experience in the near future. Indeed, a “data tsunami” is on the horizon, and most enterprises are ill prepared for it.
So, how do you get prepared? While many would say it’s all about buying anything and everything, when it comes to big data technology, the best approach is to splurge on planning. This means defining exactly what data assets are in place now, and will be in place in the future, and how they should or will be leveraged.
To face the forthcoming wave of data, certain planning aspects and questions about data integration rise to the top:
Performance, including data latency. Or, how quickly does the data need to flow from point or points A to point or points B? As the volume of data quickly rises, the data integration engines have got to keep up.
Data security and governance. Or, how will the data be protected both at-rest and in-flight, and how will the data be managed in terms of controls on use and change?
Abstraction, and removing data complexity. Or, how will the enterprise remap and re-purpose key enterprise data that may not currently exist in a well-defined and functional structure?
Integration with cloud-based data. Or, how will the enterprise link existing enterprise data assets with those that exist on remote cloud platforms?
While this may seem like a complex and risky process, think through the problems, leverage the right technology, and you can remove the risk and complexity. The enterprises that seem to fail at data integration do not follow that advice.
I suspect the explosion of data to be the biggest challenge enterprise IT will face in many years. While a few will take advantage of their data, most will struggle, at least initially. Which route will you take?
- Business and IT: Stop dissing each other. We all do it. Despite any platitudes about business-IT alignment, there is always griping behind closed doors. Let’s all promise to go the entire month of January without saying anything negative about the other team, and on a weekly basis express gratitude or provide positive feedback.
- Don’t let the hype fool you. Big Data. Internet of Things. Cloud/Social/Mobile (which has seemingly morphed into a single word). Hype? Definitely yes. Vaporware? Sometimes. Ignore it until “it’s real”? Definitely not. There are kernels of reality hidden in most of the hype. You have to find those kernels, and then let your mind open up to what the potential is in your own realm.
- Marry right brain with left brain. Most of us are heavy left brain people when we’re on the job. And while being data-driven, analytical and methodical are important, what separates the innovators from the followers is the spark of intuition, wisdom or creativity that is based on facts and knowledge, but not bound by it.
- Use social to discuss issues and gain knowledge rather than while away time. Social media has been extremely powerful for connecting people. But a shockingly high percentage of the social content is trivial—following celebrities; sharing selfies; updating friends on the latest meal eaten; lodging complaints about various first world problems. What if we diverted 25% of the social media time we spend on frivolous trivia to intellectual engagement and intelligent discussions about real issues? What could we change in our society if that power was unleashed?
- Use data for good. There are many uses for all the data flowing around us. Many of them are transformative— changing business models, revolutionizing industries, in some cases changing society. However, most of the ones being discussed today focus on corporate profit as opposed to societal good—think of all the investment in better targeting marketing offers to consumers. There is absolutely nothing wrong with utilizing data to grow business, and healthy businesses provide jobs, foster innovation and drive economic growth. However, business profit should not be our only goal. A few people and organizations (such as DataKind) are thinking about how to use data for good. Meaning societal good. If a few more of us carve out a portion of our time and brain power to focus on potential ways data can be harnessed to benefit our broader community, imagine the impact we could have on education, healthcare, the environment, economic hardship and the other myriad challenges we face around the world. Perhaps this wish is the most pollyanaish of them all, but I’ll keep doing what I can to forward the cause.
As a Tesla owner, I recently had the experience of calling Tesla service after a yellow warning message appeared on the center console of my car.” Check tire pressure system. Call Tesla Service.” While still on the freeway, I voice dialed Tesla with my iPhone and was in touch with a service representative within minutes.
|Me: A yellow warning message just appeared on my dash and also the center console.
Tesla rep: Yes, I see – is it the tire pressure warning?
Me: Yes – do I need to pull into a gas station? I haven’t had to visit a gas station since I purchased the car.
Tesla rep: Well, I also see that you are traveling on a freeway that has some steep elevation – it’s possible the higher altitude is affecting your car’s tires temporarily until the pressure equalizes. Let me check your tire pressure monitoring sensor in a half hour. If the sensor still detects a problem, I will call you and give further instructions.
As it turned out, the warning message disappeared after ten minutes and everything was fine for the rest of the trip. However, the episode served as a reminder that the world will be much different with the advent of the Internet of Things. Just as humans connected with mobile phones become more productive, machines and devices connected to the network become more useful. In this case, a connected automobile allowed the remote service rep to remotely access vehicle data, read the tire pressure sensor as well as the vehicle location/elevation and was able to suggest a course of action. This example is fairly basic compared to the opportunities afforded by networked devices/machines.
In addition to remote servicing, there are several other use case categories that offer great potential, including:
- Preventative Maintenance – monitor usage data and increase the overall uptime for machines/devices while decreasing the cost of upkeep. e.g., Tesla runs remote diagnostics on vehicles and has the ability to identify vehicle problems before they occur.
- Realtime Product Enhancements – analyze product usage data and deliver improvements quickly in response. e.g., Tesla delivers software updates that improve the usability of the vehicle based on analysis of owner usage.
- Higher Efficiency in Business Operations – analyze consolidated enterprise transaction data with machine data to identify opportunities to achieve greater operational efficiency. e.g., Tesla deployed waves of new fast charging stations (known as superchargers) based upon analyzing the travel patterns of its vehicle owners.
- Differentiated Product/Service Offerings – deliver new class of applications that operate on correlated data across a broad spectrum of sources (HINT for Tesla: a trip planning application that estimates energy consumption and recommends charging stops would be really cool…)
In each case, machine data is integrated with other data (traditional enterprise data, vehicle owner registration data, etc.) to create business value. Just as important to the connectivity of the devices and machines is the ability to integrate the data. Several Informatica customers have begun investing in M2M (aka Internet of Things) infrastructure and Informatica technology has been critical to their efforts. US Xpress utilizes mobile censors on its vast fleet of trucks and Informatica delivers the ability to consolidate, cleanse and integrate the data they collect.
My recent episode with Tesla service was a simple, yet eye-opening experience. With increasingly more machines and devices getting wireless connected and the ability to integrate the tremendous volumes of data being generated, this example is only a small hint of more interesting things to come.
Many Salesforce developers that use sandbox environments for test and development suffer from the following challenges:
- Lack of relevant data for proper testing and development (empty sandboxes)
- To fix that problem, they manually copy data from production
- Which results in exposing sensitive data to unauthorized users
- And potentially consuming more storage than allocated for sandbox environments (resulting in unexpected costs)
To address these challenges, Informatica just released Cloud Test Data Management for Salesforce. This solution is designed to give Salesforce admins and developers the ability to provision secure test data subsets to developers through an easy to use, wizard driven approach. The application is delivered as a service through a subscription-based pricing model.
The Informatica IT team uses Salesforce internally and validated an ROI based on reducing the amount of developer time used to manually script copying data from production to a sandbox, reducing the amount of time fixing defects due to not having the right test data, and eliminating the risk of a data breach by masking sensitive data.
To learn more about this new offering, watch a demonstration that shows how to create secure test data subsets for Salesforce. Also, available now, try the free Cloud Data Masking app or take a 30-day Cloud Test Data Management trial.
An explosion in mobile devices and social media usage has been the driving force behind large brands using big data solutions for deep, insightful analytics. In fact, a recent mobile consumer survey found that 71% of people used their mobile devices to access social media.
With social media becoming a major avenue for advertising, and mobile devices being the medium of access, there are numerous data points that global brands can cross-reference to get a more complete picture of their consumer, and their buying propensities. Analyzing these multitudes of data points is the reason behind the rise of big data solutions such as Hadoop.
However, Hadoop itself is only one Big Data framework, and consists of several different flavors. Facebook, which called itself the owner of the world’s largest Hadoop cluster, at 100 petabytes, outgrew its capabilities on Hadoop and is looking into a technology which would allow it to abstract its Hadoop workloads across several geographically dispersed datacenters.
When it comes to analytics projects that require intensive data warehousing, there is no one-size fits all answer for Big Data as the use cases can be extremely varied, ranging from short-term to long-term. Deploying Hadoop clusters requires specialized skills and proper capacity planning. In contrast, Big Data solutions in the cloud such as Amazon RedShift allow users to provision database nodes on demand and in a matter of minutes, without the need to take into account large outlays of infrastructure such as servers, and datacenter space. As a result, cloud-based Big Data can be a viable alternative for short-term analytics projects as well as fulfilling sandbox requirements to test out larger Big Data integration projects. Cloud-based Big Data may also make sense in situations where only a subset of the data is required for analysis as opposed to the entire dataset.
With cloud integration, much of the complexity of connecting to data sources and targets is abstracted away. Consequently, when a cloud-based Big Data deployment is combined with a cloud integration solution, it can result in even more time and cost savings and get the projects off the ground much faster.
We’ll be discussing several use cases around cloud-based Big Data in our webinar on August 22nd, Big Data in the Cloud with Informatica Cloud and Amazon Redshift, with special guests from Amazon on the event.
I’m astounded by the incredible turnout and response to MDM Day and other MDM-related events at Informatica World, and again, I see this as a sign of MDM’s importance in the business world. Attendees told their stories, swapped best-practices, and shared their visions of using MDM to improve up-sell, cross-sell, and other important business metrics. But now let’s keep the momentum going. Here I want to tell you about three free webinars that will help you to dive more deeply into MDM, and take your initiatives to the next level. The first is for any large organization, and the other two are for pharmaceutical companies. (more…)
In my last blog post on the Vibe Virtual Data Machine (VDM), I wrote about the history of Vibe. Now I will cover a little more, at a high level, on what is in the Vibe Virtual Data Machine as well as a little bit of information on how it works.
The Informatica Vibe virtual data machine is a data management engine that knows how to ingest data and then very efficiently transform, cleanse, manage, or combine it with other data. It is the core engine that drives the Informatica Platform. You can’t buy the Vibe VDM standalone, it comes with every version of Informatica PowerCenter as well as other products like our federation services, PowerCenter Big Data Edition for Hadoop, Informatica Data Quality as well as the Informatica Cloud products.
The Vibe VDM works by receiving a set of instructions that describe the data source(s) from which it will extract data, the rules and flow by which that data will be transformed, analyzed, masked, archived, matched, or cleansed, and ultimately where that data will be loaded when the processing is finished.
The instructions set is generated by creating a graphical mapping of the data flow as well as the transformation and data cleansing logic that is part of that flow. The graphical instructions are then converted into code that Vibe then interprets as its instruction set. One other important thing to know about Vibe is that it is most often run as a standalone engine running on Linux, Unix or Windows. However, it also runs directly on Hadoop and when it is used as part of the Informatica Cloud set of products, it is a key component of the on premise agent that is controlled and managed by the Informatica Cloud.
Lastly the Vibe VDM is available for deployment as an SDK that can be embedded into an application. So instead of moving data to a data integration engine for processing, you can move the engine to the data. This concept of embedding a VDM into an application is the same idea as building an application on an application server. One way to think about Vibe is like a very use case specific application server specifically built for handling the data integration and quality aspects of an application.
Vibe consists of a number of fundamental components (see Figure below):
Transformation Library: This is a collection of useful, prebuilt transformations that the engine calls to combine, transform, cleanse, match, and mask data. For those familiar with PowerCenter or Informatica Data Quality, this library is represented by the icons that the developer can drag and drop onto the canvas to perform actions on data.
Optimizer: The Optimizer compiles data processing logic into internal representation to ensure effective resource usage and efficient run time based on data characteristics and execution environment configurations.
Executor: This is a run-time execution engine that orchestrates the data logic using the appropriate transformations. The engine reads/writes data from an adapter or directly streams the data from an application. The executor can physically move data or can present results via data virtualization.
Connectors: Informatica’s connectivity extensions provide data access from various data sources. This is what allows Informatica Platform users to connect to almost any data source or application for use by a variety of data movement technologies and modes, including batch, request/response, and publish/subscribe.
Vibe Software Development Kit (SDK): While not shown in the diagram above, Vibe provides APIs and extensions that allow third parties to add new connectors as well as transformations. So developers are not limited
Hopefully this brief overview helps you understand a little more about what Vibe is all about. If you have questions, post them below and either I or one of the Informatica team members will respond so you can understand how Vibe is going to energize the data integration industry.