Tag Archives: virtual data machine
In my last blog post on the Vibe Virtual Data Machine (VDM), I wrote about the history of Vibe. Now I will cover a little more, at a high level, on what is in the Vibe Virtual Data Machine as well as a little bit of information on how it works.
The Informatica Vibe virtual data machine is a data management engine that knows how to ingest data and then very efficiently transform, cleanse, manage, or combine it with other data. It is the core engine that drives the Informatica Platform. You can’t buy the Vibe VDM standalone, it comes with every version of Informatica PowerCenter as well as other products like our federation services, PowerCenter Big Data Edition for Hadoop, Informatica Data Quality as well as the Informatica Cloud products.
The Vibe VDM works by receiving a set of instructions that describe the data source(s) from which it will extract data, the rules and flow by which that data will be transformed, analyzed, masked, archived, matched, or cleansed, and ultimately where that data will be loaded when the processing is finished.
The instructions set is generated by creating a graphical mapping of the data flow as well as the transformation and data cleansing logic that is part of that flow. The graphical instructions are then converted into code that Vibe then interprets as its instruction set. One other important thing to know about Vibe is that it is most often run as a standalone engine running on Linux, Unix or Windows. However, it also runs directly on Hadoop and when it is used as part of the Informatica Cloud set of products, it is a key component of the on premise agent that is controlled and managed by the Informatica Cloud.
Lastly the Vibe VDM is available for deployment as an SDK that can be embedded into an application. So instead of moving data to a data integration engine for processing, you can move the engine to the data. This concept of embedding a VDM into an application is the same idea as building an application on an application server. One way to think about Vibe is like a very use case specific application server specifically built for handling the data integration and quality aspects of an application.
Vibe consists of a number of fundamental components (see Figure below):
Transformation Library: This is a collection of useful, prebuilt transformations that the engine calls to combine, transform, cleanse, match, and mask data. For those familiar with PowerCenter or Informatica Data Quality, this library is represented by the icons that the developer can drag and drop onto the canvas to perform actions on data.
Optimizer: The Optimizer compiles data processing logic into internal representation to ensure effective resource usage and efficient run time based on data characteristics and execution environment configurations.
Executor: This is a run-time execution engine that orchestrates the data logic using the appropriate transformations. The engine reads/writes data from an adapter or directly streams the data from an application. The executor can physically move data or can present results via data virtualization.
Connectors: Informatica’s connectivity extensions provide data access from various data sources. This is what allows Informatica Platform users to connect to almost any data source or application for use by a variety of data movement technologies and modes, including batch, request/response, and publish/subscribe.
Vibe Software Development Kit (SDK): While not shown in the diagram above, Vibe provides APIs and extensions that allow third parties to add new connectors as well as transformations. So developers are not limited
Hopefully this brief overview helps you understand a little more about what Vibe is all about. If you have questions, post them below and either I or one of the Informatica team members will respond so you can understand how Vibe is going to energize the data integration industry.
Informatica Corporation CIO, Tony Young talks about the benefits of the Virtual Data Machine for companies.
Informatica’s Vibe virtual data machine can streamline big data work and allow data scientists to be more efficient
Informatica introduced an embeddable Vibe engine for not only transformation, but also for data quality, data profiling, data masking and a host of other data integration tasks. It will have a meaningful impact on the data scientist shortage.
Some clear economic facts are already apparent in the current world of data. Hadoop provides a significantly less expensive platform for gathering and analyzing data; cloud computing (potentially) is a more economical computing location than on-premises, if managed well. These are clearly positive developments. On the other hand, the human resources required to exploit these new opportunities are actually quite expensive. When there is greater demand than can be met in the short term for a hot product, suppliers put customers “on allocation” to manage the distribution to the most strategic customers.
This is the situation with “data scientists,” this new breed of experts with quantitative skills, data management skills, presentation skills and deep domain expertise. Current estimates are that there are 60,000 – 120,000 unfilled positions in the US alone. Naturally, data scientists are “allocated” to the most critical (economically lucrative) efforts, and their time is limited to those tasks that most completely leverage their unique skills.
To address this shortage, industry turns to universities to develop curricula to manufacture data scientists, but this will take time. In the meantime, salaries for data scientists are very high. Unfortunately, most data science work involves a great deal of effort that does not require data science skills, especially in the areas of managing the data prior to the insightful analytics. Some estimates are that data scientists spend 50-80% of their time finding and cleaning data, managing their computing platforms and writing programs. Reducing this effort with better tools can not only make data scientists more effective, it have an impact on the most expensive component of big data – human resources.
Informatica today introduced Vibe, its embeddable virtual data machine to do exactly that. Informatica has, for over 20 years, provided tools that allow developers to design and execute transformation of data without the need for writing or maintaining code. With Vibe, this capability is extended to include data quality, masking and profiling and the engine itself can be embedded in the platforms where the work is performed. In addition, the engine can generate separate code from a single data management design.
In the case of Hadoop, Informatica designers can continue to operate in the familiar design studio, and have Vibe generate the code for whatever platform is needed.In this way, it is possible for an Informatica developer to develop these data management routines for Hadoop, without learning Hadoop or writing code in Java. And the real advantage is that the data scientist is freed from work that can be performed by those in lower pay grades and can parallelize that work too – multiple programmers and integration developers to one data scientist.
Vibe is a major innovation for Informatica that provides many interesting opportunities for it’s customers. Easing the data scientist problem is only one.
This is a guest blog penned by Neil Raden, a well-known industry figure as an author, lecturer and practitioner. He has in-depth experience as a developer, consultant and analyst in all areas of Analytics and Decision Services including Big Data strategy and implementation, Business Intelligence, Data Warehousing, Statistical/Predictive Modeling, Decision Management, and IT systems integration including assessment, architecture, planning, project management and execution. Neil has authored dozens of sponsored white papers and articles, blogger and co-author of “Smart Enough) Systems” (Prentice Hall, 2007). He has 25 years as an actuary, software engineer and systems integrator.
This blog will be the first in a series about the something you will be hearing more about in the near future from Informatica, the Vibe™ virtual data machine or VDM for short.
So what is a virtual data machine? A virtual data machine (VDM) is an embeddable data management engine that accesses, aggregates and manages data.
Now that you understand what VDM is, what is Vibe? Vibe is simplly the branded name for the virtual data machine.
With that out of the way, here is a little more background on the history of the Vibe Virtual Data Machine for your reading pleasure:
The History of the Virtual Data Machine
Since the founding of Informatica Corporation 20 years ago, we have always had a philosophy of separating the development of data integration from the actual run-time implementation. This is what Informatica means when we say that the Informatica® PowerCenter® data integration product is metadata driven. The term “metadata driven” means that a developer does not have to know C, C++, or Java to perform data integration. The developer operates in a graphical development environment using drag-and-drop tools to visualize how data will move from system A, then be combined with data from system B, and then ultimately be cleansed and transformed when it finally arrives at system C. At the most detailed level of the development process, you might see icons representing data sets, and lines representing relationships coming out of those data sets going into other data sets, with descriptions of how that data is transformed along the way.
Figure 1: Informatica Developer drag-and-drop graphical development environment
However, you do not see code, just the metadata describing how the data will be modified along the way. The idea is that a person who is knowledgeable about data integration concepts, but is not necessarily a software developer, can develop data integration jobs to convert raw data into high-quality information that allows organizations to put their data potential to work. The implication is that far more people are able to develop data integration jobs because through the use of graphical tools, we have “democratized” data integration development.
Over time, however, data integration has become more complicated. It has moved from just being extract, transform, and load (ETL) for batch movement of data to also include data quality, real-time data, data virtualization, and now Hadoop. In addition, the integration process can be deployed both on premise and in the cloud. As data integration has become more complex, it has forced the use of a blended approach that
often requires the use of many or most of the capabilities and approaches just mentioned while the mix and match of underlying technologies keeps expanding.
This entire time, Informatica has continued to separate the development environment from the underlying data movement and transformation technology. Why is this separation so important? It is important because as new data integration approaches come along, with new deployment models like software as a service (SaaS), new technologies such as Hadoop, and new languages such as Pig and Hive and even yet to be invented languages, existing data integration developers don’t have to learn the details of how the new technology works in order to take advantage of it. In addition, the pace at which the underlying technologies are changing in the data integration and management market is increasing. So as this pace quickens, by separating development from deployment, end-users can continue to design and develop using the same interface, and under the covers, they can take advantage of new kinds of data movement and transformation engines to virtualize data, move it in batch, move it in real time, or integrate big data, without having to learn the details of the underlying language, system, or framework.
Hopefully that gives you a good intro into the history of the VDM. In my next blog installment, I will write a little more the basics of the Vibe VDM and how it works. So stay tuned, same Vibe time, same Vibe channel.
By 1874, Western Union President William Orton called telegraph messaging traffic “the nervous system of commerce.” In 1877 Western Union entered the telephone market using Alexander Graham Bell’s invention. And the rest, as they say, is history.
What do Western Union, Bell, and the telephone have to do with a discussion about data integration? A lot! Those early days laid the foundations for today’s connected world of business. Earlier today, Informatica released an announcement that I believe is of similar importance in unleashing the potential of the technology landscape we live in.
The speed of business is faster than ever before, fueled by technology innovations such as cloud, mobile, big data, and social. The barriers to entry into new markets are lower than ever before – big companies can act nimble and small, while small companies can appear large and achieve global reach.
McKinsey & Company recently published a study estimating the spate of technological disruptions (cloud, mobile, internet of things, etc.) will generate between $14 and $33 trillion of economic value by 2025. Implementing specific initiatives to use these technologies for greater economic value will require better harnessing the data and information around you – supporting faster decisions, building smarter applications, and transforming businesses. It is said that the new Boeing 787 generates around 1 terabyte of data per flight per engine. Then there are approximately 500 gigabytes generated by the rest of the plane’s systems each flight. Consider how airlines can use this information to make flights safer, more efficient and more enjoyable?
One of the primary barriers to unleashing this potential is that IT requires time to adopt new technologies. For example, for airlines to unleash the value of this new information generated in-flight, they need to collect and store the data utilizing new big data technologies. Then they need to combine the data with other systems like booking and maintenance, and deliver it to some specific application. This involves new technologies which IT needs to master. Our business ideas are now outpacing our technical readiness.
The Informatica VibeTM virtual data machine can help solve these challenges. In short, Vibe allows you to break data management tasks into two parts – (1) a visual mapping of the business logic for integrating and managing data and (2) the executable plan which is optimized and then generated to perform the data integration tasks on a variety of computing platforms. This separation of logic from physical execution allows you to map once and then deploy anywhere – over and over – without the need for recoding. Vibe seamlessly shields you from changes in the underlying technologies, programming languages, data sources, etc. If your computing platform is Oracle today and Hadoop tomorrow, no problem. Vibe lets you change deployment platforms via a simple configuration change—the virtual data machine handles the detailed changes in language and execution plan underneath. No programming skills required.
While Vibe is a technology component, it has the ability to deliver great business impact by enabling you to unleash the value in your data and information.
Deliver new initiatives 5x faster. By allowing developers to focus on the mapping, cleansing, and management of data, and allowing Vibe to generate the optimal execution code, the development effort is substantially reduced. Add to this the ease with which we can add new technologies (data sources, computing platforms, data types) without requiring a deep investment in having to learn new programming skills.
Use ALL your data for your next business idea. The next wave of applications business will pursue is not in the back office (finance, payroll), but rather at the points of interaction with customers, employees or citizens. For example, the McKinsey study suggests up to $10.8 trillion in economic impact from the mobile internet. The key value in many mobile applications comes from rapidly mashing up data from a variety of sources to enable a rich customer interaction on the device. For instance, a number of retail-oriented apps are now combining your loyalty information, in-store promotions, your location, and local competitor discounts to offer you specials when you are in the store or nearby.
Reduce cost. Everyone wants to deliver IT cheaper. In addition to the speed with which your developers can develop new applications, Vibe also allows you to embrace lower-cost technologies such as Hadoop and cloud platforms. Your existing data integration skills and mappings automatically get converted into new technologies like Hadoop at the click of a button.
Future-proofing. The one thing we are learning in the technology space is that change is going to remain with us. The change is accelerating, and it is unpredictable. There will be new databases, new mobile platforms, new technologies such as virtual reality– all with the potential to drastically transform the business. For more than 15 years, Informatica has architected new, expanded capabilities on top of Vibe to handle the latest technologies as they have come to market – all aimed at lowering the barriers to information.
I started my blog discussing how telephones changed our world. Even today, we cannot imagine a world without phones. That dial tone is the promise of a connected world, and it translates your keypad entries into some code that routes your call to another person. For data, I see Vibe as a key enabling technology that can unleash the incredible potential locked up in our information to make the world a better place.
“The report of my death was an exaggeration.”
– Mark Twain
Ah yes, another conference another old technology is declared dead. Mainframe… dead. Any programming language other than Java…. dead. 8 track tapes …OK, well some things thankfully do die, along with the Ford Pinto that I used to listen to the Beatles Greatest Hits Red Album over and over again on that 8 track… ah yes the good old days, but I digress. (more…)