Tag Archives: Vibe

Fast and Fasterer: Screaming Streaming Data on Hadoop

Hadoop

Guest Post by Dale Kim

This is a guest blog post, written by Dale Kim, Director of Product Marketing at MapR Technologies.

Recent published research shows that “faster” is better than “slower.” The point, ladies and gentlemen, is that speed, for lack of a better word, is good. But granted, you won’t always have the need for speed. My Lamborghini is handy when I need to elude the Bakersfield fuzz on I-5, but it does nothing for my Costco trips. There, I go with capacity and haul home my 30-gallon tubs of ketchup with my Ford F150. (Note: this is a fictitious example, I don’t actually own an F150.)

But if speed is critical, like in your data streaming application, then Informatica Vibe Data Stream and the MapR Distribution including Apache™ Hadoop® are the technologies to use together. But since Vibe Data Stream works with any Hadoop distribution, my discussion here is more broadly applicable. I first discussed this topic earlier this year during my presentation at Informatica World 2014. In that talk, I also briefly described architectures that include streaming components, like the Lambda Architecture and enterprise data hubs. I recommend that any enterprise architect should become familiar with these high-level architectures.

Data streaming deals with a continuous flow of data, often at a fast rate. As you might’ve suspected by now, Vibe Data Stream, based on the Informatica Ultra Messaging technology, is great for that. With its roots in high speed trading in capital markets, Ultra Messaging quickly and reliably gets high value data from point A to point B. Vibe Data Stream adds management features to make it consumable by the rest of us, beyond stock trading. Not surprisingly, Vibe Data Stream can be used anywhere you need to quickly and reliably deliver data (just don’t use it for sharing your cat photos, please), and that’s what I discussed at Informatica World. Let me discuss two examples I gave.

Large Query Support. Let’s first look at “large queries.” I don’t mean the stuff you type on search engines, which are typically no more than 20 characters. I’m referring to an environment where the query is a huge block of data. For example, what if I have an image of an unidentified face, and I want to send it to a remote facial recognition service and immediately get the identity? The image would be the query, the facial recognition system could be run on Hadoop for fast divide-and-conquer processing, and the result would be the person’s name. There are many similar use cases that could leverage a high speed, reliable data delivery system along with a fast processing platform, to get immediate answers to a data-heavy question.

Data Warehouse Onload. For another example, we turn to our old friend the data warehouse. If you’ve been following all the industry talk about data warehouse optimization, you know pumping high speed data directly into your data warehouse is not an efficient use of your high value system. So instead, pipe your fast data streams into Hadoop, run some complex aggregations, then load that processed data into your warehouse. And you might consider freeing up large processing jobs from your data warehouse onto Hadoop. As you process and aggregate that data, you create a data flow cycle where you return enriched data back to the warehouse. This gives your end users efficient analysis on comprehensive data sets.

Hopefully this stirs up ideas on how you might deploy high speed streaming in your enterprise architecture. Expect to see many new stories of interesting streaming applications in the coming months and years, especially with the anticipated proliferation of internet-of-things and sensor data.

To learn more about Vibe Data Stream you can find it on the Informatica Marketplace .


 

FacebookTwitterLinkedInEmailPrintShare
Posted in Big Data, Business Impact / Benefits, Data Services, Hadoop | Tagged , , , , | Leave a comment

Regulatory Compliance is an Opportunity, not a Cost!

Regulatory Compliance Regardless of the industry, new regulatory compliance requirements are more often than not treated like the introduction of a new tax.  A few may be supportive, some will see the benefits, but most will focus on the negatives – the cost, the effort, the intrusion into private matters.  There will more than likely be a lot of grumbling.

Across many industries there is currently a lot of grumbling, as new regulation seems to be springing up all over the place.  Pharmaceutical companies have to deal with IDMP in Europe and UDI in the USA.  This is hot on the heels of the US Sunshine Act, which is being followed in Europe by Aggregate Spend requirements.  Consumer Goods companies in Europe are looking at the consequences of beefed up 1169 requirements.  Financial Institutes are mulling over compliance to BCBS-239.  Behind the grumbling most organisations across all verticals appear to have a similar approach to regulatory compliance.  The pattern seems to go like this:

  1. Delay (The requirements may change)
  2. Scramble (They want it when?  Why didn’t we get more time?)
  3. Code to Spec (Provide exactly what they want, and only what they want)

No wonder these requirements are seen as purely a cost and an annoyance.  But it doesn’t have to be that way, and in fact, it should not.  Just like I have seen a pattern in response to compliance, I see a pattern in the requirements themselves:

  1. The regulators want data
  2. Their requirements will change
  3. When they do change, regulators will be wanting even more data!

Now read the last 3 bullet points again, but use ‘executives’ or ‘management’ or ‘the business people’ instead of ‘regulators’.  The pattern still holds true.  The irony is that execs will quickly sign off on budget to meet regulatory requirements, but find it hard to see the value in “infrastructure” projects.  Projects that will deliver this same data to their internal teams.

This is where the opportunity comes in.  pwc’s 2013 State of Compliance Report[i] shows that over 42% of central compliance budgets are in excess of $1m.  A significant figure.  Efforts outside of the compliance team imply a higher actual cost.  Large budgets are not surprising in multi-national companies, who often have to satisfy multiple regulators in a number of countries.  As an alternate to multiple over-lapping compliance projects, what if this significant budget was repurposed to create a flexible data management platform?  This approach could deliver compliance, but provide even more value internally.

Almost all internal teams are currently clamouring for additional data to drive ther newest application.  Pharma and CG sales & marketing teams would love ready access to detailed product information.  So would consumer and patient support staff, as well as down-stream partners.  Trading desks and client managers within Financial Institutes should really have real-time access to their risk profiles guiding daily decision making.  These data needs will not be going away.  Why should regulators be prioritised over the people who drive your bottom line and who are guardians of your brand?

A flexible data management platform will serve everyone equally. Foundational tools for a flexible data management platform exist today including Data Quality,  MDM, PIM and VIBE, Informatica’s Virtual Data Machine.  Each of them play a significant role in easing of regulatory compliance, and as a bonus they deliver measureable business value in their own right.  Implemented correctly, you will get enhanced data agility & visibility across the entire organisation as part of your compliance efforts.  Sounds like ‘Buy one Get One Free’, or BOGOF in retail terms.

Unlike taxes, BOGOF opportunities are normally embraced with open arms.  Regulatory compliance should receive a similar welcome – an opportunity to build the foundations for universal delivery of data which is safe, clean and connected.  A 2011 study by The Economist found that effective regulatory compliance benefits businesses across a wide range of performance metrics[ii].

Is it time to get your free performance boost?


[i] Deeper Insight for Greater Strategic Value, pwc 2013
[ii] Compliance and competitiveness, The Economist 2011
FacebookTwitterLinkedInEmailPrintShare
Posted in Financial Services, Governance, Risk and Compliance, Healthcare | Tagged , , , | Leave a comment

VIBE – the solution to reduce IT costs

The beauty of Vibe framework is that it can reduce IT costs as well as friction between IT and business. Let us take a typical data integration project  - for example, creating a simple report — which involves taking data from multiple source systems including mainframes, AIX and load it into forecasting tool (Oracle) for analyzing future demand. The project involves a – Business analyst who gathers requirements from business users and comes up with functional requirements specification document, Solution architect understands the functional requirements specification document and comes up with a high level design (HLD) document. The design document is fed back to business users for approval and then to Designer to create a low level design (LLD) document, designer creates LLD from HLD provided by solution architect (which serves as pseudo code) and Developer who does coding, unit and system integration testing, supports UAT and deployment. In a typical outsourcing model the business analyst and solution architect role is filled by a person sitting at onsite and designers and developers will be from offshore.

Let us say the total effort for this project is100 person days (that’s three months to get only a simple report). Out of 100 person days, 10 person days will be devoted for requirements gathering, 20 for high level design, 50 for low level design + development + unit testing, 5 for system integration testing, 10 for UAT and 5 for deployment. An addition of 30 person days will have to be devoted for change requests as there will be some changes in business requirements(like real time reporting, moving data into cloud …) during this time frame.  So, in three months, we will get something, with lots of people involved.  Let’s hope it’s what we need.

 Now let us examine the effort with the Vibe framework.  With Vibe, business analyst will be able to come up with a mapping (80% reusable code) in the Informatica analyst tool which can be directly imported to multiple platforms like Informatica’s Powercenter, hadoop or the cloud. So the effort for HLD and LLD is almost nil and development effort will be an average of 20 person days which involves importing the mapping, changing the sources and target and unit testing. All other aspects remaining the same, now with Vibe there will a 45-50% of effort savings. The effort for change request will also be less as the process is more agile and the code can be imported to multiple platforms.

 3

Expected Effort savings with Vibe framework

Now let us look at the cost perspective. In the traditional scenario, Business analyst will be taking 20% of the total cost, 20% by the solution architect, 20% by designer and the final 40% will be developer costs. With Vibe, the role of solution architect and designer is taken care of by business analyst himself.  The developer cost is also less as the mapping is partially created. So there will be 40-50% of savings in cost (both onsite cost as well as the total cost) as well. 

 2

Expected Cost savings with Vibe framework

In a traditional scenario most of the delays and cost overruns are due to the differences in what is needed Vs. what is delivered. Business users complain that IT doesn’t understand their requirements and when you ask the IT folks, they say the requirements were not properly documented and the business “assumed” that the functionality would be delivered. This miscommunication makes the job of Business analyst very challenging and also means a head ache for the C-level executives because they can’t get the information they need when they need it. With Vibe the challenges faced by business analyst for requirements documentation is highly reduced as he or she can directly show to the business the expected output (in our case- the sample output data for forecast report). From a CEO perspective, it is a “win-win” as the friction between the business and IT almost goes away and there is significant reduction in IT expenditure. He, or she, can happily concentrate more on increasing the business growth. Thus Vibe keeps everyone happy.

Sounds interesting? Get in touch with us to learn more about Vibe.

FacebookTwitterLinkedInEmailPrintShare
Posted in Data Integration Platform | Tagged | Leave a comment

Turn change from a threat into a competitive weapon with Vibe

CIOs, CDOs and other IT executives are wrestling with a technology landscape that is not merely shifting and evolving—it’s transforming faster than mere mortals can keep up with.  And it’s impossible to predict what technologies will be the right ones for your organization three years from now.  Change is simply happening too fast. This Potential at Work article talks about the fundamental shift in architectural approach necessary to managing change, and how Informatica Vibe is the architectural secret sauce.  Check out how Tony Young and Mark Smith think about the problem and the way out of the morass, and chime in with your own ideas.

FacebookTwitterLinkedInEmailPrintShare
Posted in CIO | Tagged | Leave a comment

More on the Vibe Virtual Data Machine

In my last blog post on the Vibe Virtual Data Machine (VDM), I wrote about the history of Vibe.  Now I will cover a little more, at a high level, on what is in the Vibe Virtual Data Machine as well as a little bit of information on how it works.

The Informatica Vibe virtual data machine is a data management engine that knows how to ingest data and then very efficiently transform, cleanse, manage, or combine it with other data. It is the core engine that drives the Informatica Platform.  You can’t buy the Vibe VDM standalone, it comes with every version of Informatica PowerCenter as well as other products like our federation services, PowerCenter Big Data Edition for Hadoop, Informatica Data Quality as well as the Informatica Cloud products.

The Vibe VDM works by receiving a set of instructions that describe the data source(s) from which it will extract data, the rules and flow by which that data will be transformed, analyzed, masked, archived, matched, or cleansed, and ultimately where that data will be loaded when the processing is finished.

The instructions set is generated by creating a graphical mapping of the data flow as well as the transformation and data cleansing logic that is part of that flow.  The graphical instructions are then converted into code that Vibe then interprets as its instruction set.  One other important thing to know about Vibe is that it is most often run as a standalone engine running on Linux, Unix or Windows.  However, it also runs directly on Hadoop and when it is used as part of the Informatica Cloud set of products, it is a key component of the on premise agent that is controlled and managed by the Informatica Cloud.

Lastly the Vibe VDM is available for deployment as an SDK that can be embedded into an application.  So instead of moving data to a data integration engine for processing, you can move the engine to the data.  This concept of embedding a VDM into an application is the same idea as building an application on an application server. One way to think about Vibe is like a very use case specific application server specifically built for handling the data integration and quality aspects of an application.

Vibe consists of a number of fundamental components (see Figure below):

vibe todd blog

Transformation Library: This is a collection of useful, prebuilt transformations that the engine calls to combine, transform, cleanse, match, and mask data. For those familiar with PowerCenter or Informatica Data Quality, this library is represented by the icons that the developer can drag and drop onto the canvas to perform actions on data.

Optimizer: The Optimizer compiles data processing logic into internal representation to ensure effective resource usage and efficient run time based on data characteristics and execution environment configurations.

Executor: This is a run-time execution engine that orchestrates the data logic using the appropriate transformations. The engine reads/writes data from an adapter or directly streams the data from an application.  The executor can physically move data or can present results via data virtualization.

Connectors: Informatica’s connectivity extensions provide data access from various data sources. This is what allows Informatica Platform users to connect to almost any data source or application for use by a variety of data movement technologies and modes, including batch, request/response, and publish/subscribe.

Vibe Software Development Kit (SDK): While not shown in the diagram above, Vibe provides APIs and extensions that allow third parties to add new connectors as well as transformations. So developers are not limited

Hopefully this brief overview helps you understand a little more about what Vibe is all about.  If you have questions, post them below and either I or one of the Informatica team members will respond so you can understand how Vibe is going to energize the data integration industry.

FacebookTwitterLinkedInEmailPrintShare
Posted in Big Data, Cloud Computing, Data Integration, Data Integration Platform | Tagged , , | 1 Comment

A CIO’s Take on Informatica’s Virtual Data Machine

Informatica Corporation CIO, Tony Young talks about the benefits of the Virtual Data Machine for companies.

FacebookTwitterLinkedInEmailPrintShare
Posted in CIO | Tagged , , , , | Leave a comment

Informatica’s Vibe virtual data machine can streamline big data work and allow data scientists to be more efficient

Informatica introduced an embeddable Vibe engine for not only transformation, but also for data quality, data profiling, data masking and a host of other data integration tasks. It will have a meaningful impact on the data scientist shortage.

Some clear economic facts are already apparent in the current world of data. Hadoop provides a significantly less expensive platform for gathering and analyzing data; cloud computing (potentially) is a more economical computing location than on-premises, if managed well. These are clearly positive developments. On the other hand, the human resources required to exploit these new opportunities are actually quite expensive. When there is greater demand than can be met in the short term for a hot product, suppliers put customers “on allocation” to manage the distribution to the most strategic customers.

This is the situation with “data scientists,” this new breed of experts with quantitative skills, data management skills, presentation skills and deep domain expertise. Current estimates are that there are 60,000 – 120,000 unfilled positions in the US alone. Naturally, data scientists are “allocated” to the most critical (economically lucrative) efforts, and their time is limited to those tasks that most completely leverage their unique skills.

To address this shortage, industry turns to universities to develop curricula to manufacture data scientists, but this will take time. In the meantime, salaries for data scientists are very high. Unfortunately, most data science work involves a great deal of effort that does not require data science skills, especially in the areas of managing the data prior to the insightful analytics. Some estimates are that data scientists spend 50-80% of their time finding and cleaning data, managing their computing platforms and writing programs. Reducing this effort  with better tools can not only make data scientists more effective, it have an impact on the most expensive component of big data – human resources.

Informatica today introduced Vibe, its embeddable virtual data machine to do exactly that. Informatica has, for over 20 years, provided tools that allow developers to design and execute transformation of data without the need for writing or maintaining code. With Vibe, this capability is extended to include data quality, masking and profiling and the engine itself can be embedded in the platforms where the work is performed. In addition, the engine can generate separate code from a single data management design.

In the case of Hadoop, Informatica designers can continue to operate in the familiar design studio, and have Vibe generate the code for whatever platform is needed.In this way, it is possible for an Informatica developer to develop these data management routines for Hadoop, without learning Hadoop or writing code in Java. And the real advantage is that the data scientist is freed from work that can be performed by those in lower pay grades and can parallelize that work too – multiple programmers and integration developers to one data scientist.

Vibe is a major innovation for Informatica that provides many interesting opportunities for it’s customers. Easing the data scientist problem is only one.

———————————

Neil Raden

This is a guest blog penned by Neil Raden, a well-known industry figure as an author, lecturer and practitioner. He has in-depth experience as a developer, consultant and analyst in all areas of Analytics and Decision Services including Big Data strategy and implementation, Business Intelligence, Data Warehousing, Statistical/Predictive Modeling, Decision Management, and IT systems integration including assessment, architecture, planning, project management and execution. Neil has authored dozens of sponsored white papers and articles, blogger and co-author of “Smart Enough) Systems” (Prentice Hall, 2007). He has 25 years as an actuary, software engineer and systems integrator.

FacebookTwitterLinkedInEmailPrintShare
Posted in Big Data, Data Integration, Data masking, Data Quality | Tagged , , , , , , , | Leave a comment

The History of Informatica Vibe Part I

This blog will be the first in a series about the something you will be hearing more about in the near future from Informatica, the Vibe™ virtual data machine or VDM for short.

So what is a virtual data machine?   A virtual data machine (VDM) is an embeddable data management engine that accesses, aggregates and manages data.

Now that you understand what VDM is, what is Vibe?  Vibe is simplly the branded name for the virtual data machine.

With that out of the way, here is a little more background on the history of the Vibe Virtual Data Machine for your reading pleasure:

The History of the Virtual Data Machine

 Since the founding of Informatica Corporation 20 years ago, we have always had a philosophy of separating the development of data integration from the actual run-time implementation. This is what Informatica means when we say that the Informatica® PowerCenter® data integration product is metadata driven. The term “metadata driven” means that a developer does not have to know C, C++, or Java to perform data integration. The developer operates in a graphical development environment using drag-and-drop tools to visualize how data will move from system A, then be combined with data from system B, and then ultimately be cleansed and transformed when it finally arrives at system C. At the most detailed level of the development process, you might see icons representing data sets, and lines representing relationships coming out of those data sets going into other data sets, with descriptions of how that data is transformed along the way.

Todd Vibe Picture

Figure 1: Informatica Developer drag-and-drop graphical development environment

However, you do not see code, just the metadata describing how the data will be modified along the way. The idea is that a person who is knowledgeable about data integration concepts, but is not necessarily a software developer, can develop data integration jobs to convert raw data into high-quality information that allows organizations to put their data potential to work. The implication is that far more people are able to develop data integration jobs because through the use of graphical tools, we have “democratized” data integration development.

Over time, however, data integration has become more complicated. It has moved from just being extract, transform, and load (ETL) for batch movement of data to also include data quality, real-time data, data virtualization, and now Hadoop. In addition, the integration process can be deployed both on premise and in the cloud. As data integration has become more complex, it has forced the use of a blended approach that

often requires the use of many or most of the capabilities and approaches just mentioned while the mix and match of underlying technologies keeps expanding.

This entire time, Informatica has continued to separate the development environment from the underlying data movement and transformation technology. Why is this separation so important? It is important because as new data integration approaches come along, with new deployment models like software as a service (SaaS), new technologies such as Hadoop, and new languages such as Pig and Hive and even yet to be invented languages, existing data integration developers don’t have to learn the details of how the new technology works in order to take advantage of it. In addition, the pace at which the underlying technologies are changing in the data integration and management market is increasing. So as this pace quickens, by separating development from deployment, end-users can continue to design and develop using the same interface, and under the covers, they can take advantage of new kinds of data movement and transformation engines to virtualize data, move it in batch, move it in real time, or integrate big data, without having to learn the details of the underlying language, system, or framework.

Hopefully that gives you a good intro into the history of the VDM.  In my next blog installment, I will write a little more the basics of the Vibe VDM and how it works.  So stay tuned, same Vibe time, same Vibe channel.

FacebookTwitterLinkedInEmailPrintShare
Posted in Data Integration | Tagged , , , , | 2 Comments

Putting Business First with Data Integration

By 1874, Western Union President William Orton called telegraph messaging traffic “the nervous system of commerce.” In 1877 Western Union entered the telephone market using Alexander Graham Bell’s invention. And the rest, as they say, is history.

What do Western Union, Bell, and the telephone have to do with a discussion about data integration? A lot! Those early days laid the foundations for today’s connected world of business. Earlier today, Informatica released an announcement that I believe is of similar importance in unleashing the potential of the technology landscape we live in.

The speed of business is faster than ever before, fueled by technology innovations such as cloud, mobile, big data, and social. The barriers to entry into new markets are lower than ever before – big companies can act nimble and small, while small companies can appear large and achieve global reach.

McKinsey & Company recently published a study estimating the spate of technological disruptions (cloud, mobile, internet of things, etc.) will generate between $14 and $33 trillion of economic value by 2025.  Implementing specific initiatives to use these technologies for greater economic value will require better harnessing the data and information around you – supporting faster decisions, building smarter applications, and transforming businesses. It is said that the new Boeing 787 generates around 1 terabyte of data per flight per engine. Then there are approximately 500 gigabytes generated by the rest of the plane’s systems each flight. Consider how airlines can use this information to make flights safer, more efficient and more enjoyable?

One of the primary barriers to unleashing this potential is that IT requires time to adopt new technologies. For example, for airlines to unleash the value of this new information generated in-flight, they need to collect and store the data utilizing new big data technologies. Then they need to combine the data with other systems like booking and maintenance, and deliver it to some specific application. This involves new technologies which IT needs to master. Our business ideas are now outpacing our technical readiness.

The Informatica VibeTM virtual data machine can help solve these challenges. In short, Vibe allows you to break data management tasks into two parts – (1) a visual mapping of the business logic for integrating and managing data and (2) the executable plan which is optimized and then generated to perform the data integration tasks on a variety of computing platforms. This separation of logic from physical execution allows you to map once and then deploy anywhere – over and over – without the need for recoding. Vibe seamlessly shields you from changes in the underlying technologies, programming languages, data sources, etc. If your computing platform is Oracle today and Hadoop tomorrow, no problem. Vibe lets you change deployment platforms via a simple configuration change—the virtual data machine handles the detailed changes in language and execution plan underneath. No programming skills required.

While Vibe is a technology component, it has the ability to deliver great business impact by enabling you to unleash the value in your data and information.

Deliver new initiatives 5x faster. By allowing developers to focus on the mapping, cleansing, and management of data, and allowing Vibe to generate the optimal execution code, the development effort is substantially reduced. Add to this the ease with which we can add new technologies (data sources, computing platforms, data types) without requiring a deep investment in having to learn new programming skills.

Use ALL your data for your next business idea. The next wave of applications business will pursue is not in the back office (finance, payroll), but rather at the points of interaction with customers, employees or citizens. For example, the McKinsey study suggests up to $10.8 trillion in economic impact from the mobile internet. The key value in many mobile applications comes from rapidly mashing up data from a variety of sources to enable a rich customer interaction on the device. For instance, a number of retail-oriented apps are now combining your loyalty information, in-store promotions, your location, and local competitor discounts to offer you specials when you are in the store or nearby.

Reduce cost. Everyone wants to deliver IT cheaper. In addition to the speed with which your developers can develop new applications, Vibe also allows you to embrace lower-cost technologies such as Hadoop and cloud platforms. Your existing data integration skills and mappings automatically get converted into new technologies like Hadoop at the click of a button.

Future-proofing. The one thing we are learning in the technology space is that change is going to remain with us. The change is accelerating, and it is unpredictable. There will be new databases, new mobile platforms, new technologies such as virtual reality– all with the potential to drastically transform the business. For more than 15 years, Informatica has architected new, expanded capabilities on top of Vibe to handle the latest technologies as they have come to market – all aimed at lowering the barriers to information.

I started my blog discussing how telephones changed our world. Even today, we cannot imagine a world without phones. That dial tone is the promise of a connected world, and it translates your keypad entries into some code that routes your call to another person. For data, I see Vibe as a key enabling technology that can unleash the incredible potential locked up in our information to make the world a better place.

FacebookTwitterLinkedInEmailPrintShare
Posted in Data Integration, Data Integration Platform, Informatica Events | Tagged , , , , | Leave a comment