Guest Author

Guest Author

Data Visibility From the Source to Hadoop and Beyond with Cloudera and Informatica Integration

Data Visibility From the Source to Hadoop

Data Visibility From the Source to Hadoop

This is a guest post by Amr Awadallah, Founder, CTO at Cloudera, Inc.

It takes a village to build mainstream big data solutions. We often get so caught up in Hadoop use cases and customer successes that sometimes we don’t talk enough about the innovative partner technologies and integrations that enable our customers to put the enterprise data hub at the core of their data architecture and innovate with confidence. Cloudera and Informatica have been working together to integrate our products to enable new levels of productivity and lower deployment and production risk.

Going from Hadoop to an enterprise data hub, means a number of things. It means that you recognize the business value of capturing and leveraging all your data for exploration and analytics. It means you’re ready to make the move from Hadoop pilot project to production. And it means your data is important enough that it’s worth securing and making data pipelines visible. It’s the visibility layer, and in particular, the unique integration between Cloudera Navigator and Informatica that I want to focus on in this post.

The era of big data has ushered in increased regulations in a number of industries – banking, retail, healthcare, energy – most of which deal in how data is managed throughout its lifecycle. Cloudera Navigator is the only native end-to-end solution for governance in Hadoop. It provides visibility for analysts to explore data in Hadoop, and enables administrators and managers to maintain a full audit history for HDFS, HBase, Hive, Impala, Spark and Sentry then run reports on data access for auditing and compliance.The integration of Informatica Metadata Manager in the Big Data Edition and Cloudera Navigator extends this level of visibility and governance beyond the enterprise data hub.

Hadoop
Today, only Informatica and Cloudera provide end-to-end data lineage from source systems through Hadoop, and into BI/analytic and data warehouse systems. And you can view it from a single pane within Informatica.

This is important because Hadoop, and the enterprise data hub in particular, doesn’t function in a silo. It’s an integrated part of a larger enterprise-wide data management architecture. The better the insight into where data originated, where it traveled, who had access to it and what they did with it, the greater our ability to report and audit. No other combination of technologies provides this level of audit granularity.

But more so than that, the visibility Cloudera and Informatica provides our joint customers with the ability to confidently stand up an enterprise data hub as a part of their production enterprise infrastructure because they can verify the integrity of the data that undergirds their analytics. I encourage you to check out a demo of the Informatica-Cloudera Navigator integration at this link: http://infa.media/1uBpPbT

You can also check out a demo and learn a little more about Cloudera Navigator  and the Informatica integration in the recorded  TechTalk hosted by Informatica at this link:

http://www.informatica.com/us/company/informatica-talks/?commid=133311

FacebookTwitterLinkedInEmailPrintShare
Posted in Big Data, Cloud Data Integration, Governance, Risk and Compliance, Hadoop | Tagged , , , , | Leave a comment

Building Engagement through Service and Support

This is a guest post by Tom Petrocelli, Research Director of Enterprise Social, Mobile and Cloud Applications at Neuralytix

Engagement through Service and Support

Engagement through Service and Support

A product is not just an object or the bits that comprise software or digital media; it is an entire experience. The complete customer experience is vital to the overall value a customer derives from their product and the on-going relationship between the customer and the vendor. The customer experience is enhanced through a series of engagements over a variety of digital, social, and personal channels.  Each point of contact between a vendor and customer is an opportunity for engagement. These engagements over time affect the level of satisfaction the customers with the vendor relationship.

Service and support is a critical part of this engagement strategy. Retail and consumer goods companies recognize the importance of support to the overall customer relationship. Subsequently, these companies have integrated their before and after-purchase support into their multi-channel marketing and omni-channel marketing strategies. While retail and consumer products companies have led the way on support an integral part of on-going customer engagement, B2B companies have begun to do the same. Enterprise IT companies, which are primarily B2B companies, have been expanding their service and support capabilities to create more engagement between their customers and themselves. Service offerings have expanded to include mobile tools, analytics-driven self-help, and support over social media and other digital channels. The goal of these investments has been to make interactions more productive for the customer, strengthen relationships through positive engagement, and to gather data that drives improvements in both the product and service.

A great example of an enterprise software company that understands the value in customer engagement though support is Informatica.  Known primarily for their data integration products, Informatica has been quickly expanding their portfolio of data management and data access products over the past few years. This growth in their product portfolio has introduced many new types of customers Informatica and created more complex customer relationships. For example, the new SpringBok product is aimed at making data accessible to the business user, a new type of interaction for Informatica. Informatica has responded with a collection of new service enhancements that augment and extend existing service channels and capabilities.

What these moves say to me is that Informatica has made a commitment to deeper engagement with customers. For example, Informatica has expanded the avenues from which customers can get support. By adding social media and mobile capabilities, they are creating additional points of presence that address customer issues when and where customers are. Informatica provides support on the customers’ terms instead of requiring customers to do what is convenient for Informatica. Ultimately, Informatica is creating more value by making it easier for customers to interact with them. The best support is that which solves the problem quickest with the least amount of effort. Intuitive knowledge base systems, online support, sourcing answers from peers, and other tools that help find solutions immediately are more valued than traditional phone support. This is the philosophy that drives the new self-help portal, predicative escalation, and product adoption services.

Informatica is also shifting the support focus from products to business outcomes. They are manage problems holistically and are not simply trying to create product band-aids. This shows a recognition that technical problems with data are actually business problems that have broad effects on a customer’s business.  Contrast this with the traditional approach to support that focuses fixing a technical issue but doesn’t necessarily address the wider organizational effects of those problems.

More than anything, these changes are preparation for a very different support landscape. With the launch of the Springbok data analytics tool, Informatica’s support organization is clearly positioning itself to help business analysts and similar semi-technical end-users. The expectations of these end-users have been set by consumer applications. They expect more automation and more online resources that help them to use and derive value from their software and are less enamored with fixing technical problems.

In the past, technical support was mostly charged with solving immediate technical issues.  That’s still important since the products have to work first to be useful. Now, however, support organizations has an expanded mission to be part of the overall customer experience and to enhance overall engagement. The latest enhancements to the Informatica support portfolio reflects this mission and prepares them for the next generation of non-IT Informatica customers.

FacebookTwitterLinkedInEmailPrintShare
Posted in B2B Data Exchange, Big Data, Business Impact / Benefits, Business/IT Collaboration | Tagged , , | Leave a comment

Data Integration Webinar Follow-Up: By Our First Strange and Fatal Interview

How to Maximize Value of Data Management Investments

How to Maximize Value of Data Management Investments

This is a guest author post by Philip Howard, Research Director, Bloor Research.

I recently posted a blog about an interview style webcast I was doing with Informatica on the uses and costs associated with data integration tools.

I’m not sure that the poet John Donne was right when he said that it was strange, let alone fatal. Somewhat surprisingly, I have had a significant amount of feedback following this webinar. I say “surprisingly” because the truth is that I very rarely get direct feedback. Most of it, I assume, goes to the vendor. So, when a number of people commented to me that the research we conducted was both unique and valuable, it was a bit of a thrill. (Yes, I know, I’m easily pleased).

There were a number of questions that arose as a result of our discussions. Probably the most interesting was whether moving data into Hadoop (or some other NoSQL database) should be treated as a separate use case. We certainly didn’t include it as such in our original research. In hindsight, I’m not sure that the answer I gave at the time was fully correct. I acknowledged that you certainly need some different functionality to integrate with a Hadoop environment and that some vendors have more comprehensive capabilities than others when it comes to Hadoop and the same also applies (but with different suppliers, when it comes to integrating with, say, MongoDB or Cassandra or graph databases). However, as I pointed out in my previous blog, functionality is ephemeral. And, just because a particular capability isn’t supported today, doesn’t mean it won’t be supported tomorrow. So that doesn’t really affect use cases.

However, where I was inadequate in my reply was that I only referenced Hadoop as a platform for data warehousing, stating that moving data into Hadoop was not essentially different from moving it into Oracle Exadata or Teradata or HP Vertica. And that’s true. What I forgot was the use of Hadoop as an archiving platform. As it happens we didn’t have an archiving use case in our survey either. Why not? Because archiving is essentially a form of data migration – you have some information lifecycle management and access and security issues that are relevant to archiving once it is in place but that is after the fact: the process of discovering and moving the data is exactly the same as with data migration. So: my bad.

Aside from that little caveat, I quite enjoyed the whole event. Somebody or other (there’s always one!) didn’t quite get how quantifying the number of end points in a data integration scenario was a surrogate measure for complexity (something we took into account) and so I had to explain that. Of course, it’s not perfect as a metric but it’s the only alternative to ask eye of the beholder type questions which aren’t very satisfactory.

Anyway, if you want to listen to the whole thing you can find it HERE:

FacebookTwitterLinkedInEmailPrintShare
Posted in B2B, B2B Data Exchange, Data Integration Platform, Data Quality | Tagged , , , | Leave a comment

Getting Value Out of Data Integration

The post is by Philip Howard, Research Director, Bloor Research.

Getting value out of Data Integration

Live Bloor Webinar, Nov 5

One of the standard metrics used to support buying decisions for enterprise software is total cost of ownership. Typically, the other major metric is functionality. However functionality is ephemeral. Not only does it evolve with every new release but while particular features may be relevant to today’s project there is no guarantee that those same features will be applicable to tomorrow’s needs. A broader metric than functionality is capability: how suitable is this product for a range of different project scenarios and will it support both simple and complex environments?

Earlier this year Bloor Research published some research into the data integration market, which exactly investigated these issues: how often were tools reused, how many targets and sources were involved, for what sort of projects were products deemed suitable? And then we compared these with total cost of ownership figures that we also captured in our survey. I will be discussing the results of our research live with Kristin Kokie, who is the interim CIO of Informatica, on Guy Fawkes’ day (November 5th). I don’t promise anything explosive but it should be interesting and I hope you can join us. The discussions will be vendor neutral (mostly: I expect that Kristin has a degree of bias).

To Register for the Webinar, click Here.

FacebookTwitterLinkedInEmailPrintShare
Posted in Data Integration, Data Integration Platform, Data Migration | Tagged , , | Leave a comment

Build Your Modern Data Architecture with Hadoop and Informatica

Hortonworks, Hadoop and Informatica

Build Your Modern Data Architecture

 

This is a guest post by John Kreisa, Vice President Strategic Marketing, Hortonworks

Today, 80% of the efforts in Big Data projects are related to extracting, transforming and loading data (ETL). Hortonworks and Informatica have teamed-up to leverage the power of Informatica Big Data Edition to use their existing skills to improve the efficiency of these operations and better leverage their resources in a modern data architecture. (MDA)

Next Generation Data Management

The Hortonworks Data Platform and Informatica BDE enable organizations to optimize their ETL workloads with long-term storage and processing at scale in Apache Hadoop. With Hortonworks and Informatica, you can:

• Leverage all internal and external data to achieve the full predictive power that drives the success of modern data-driven businesses.
• Optimize the entire big data supply chain on Hadoop, turning data into actionable information to drive business value.

Key Advantages

Imagine a world where you would have access to your most strategic data in a timely fashion, no matter how old the data is, where it is stored, or under what format. By leveraging Hadoop’s power of distributed processing, organizations can lower costs of data storage and processing and support large data distribution with high through put and concurrency.

Overall, the alignment between business and IT grows. The Big Data solution based on Informatica and Hortonworks allows for a complete data pipeline to ingest, parse, integrate, cleanse, and prepare data for analysis natively on Hadoop thereby increasing developer productivity by 5x over hand-coding.

Where Do We Go From Here?

At the end of the day, Big Data is not about the technology. It is about the deep business and social transformation every organization will go through. The possibilities to make more informed decisions, identify patterns, proactively address fraud and threats, and predict pretty much anything are endless.

This transformation will happen as the technology is adopted and leveraged by more and more business users. We are already seeing the transition from 20-node clusters to 100-node clusters and from a handful of technology-savvy users relying on Hadoop to hundreds of business users. Informatica and Hortonworks are accelerating the delivery of actionable Big Data insights to business users by automating the entire data pipeline.

Try It For Yourself

On September 10, 2014, Informatica announced the 60-day trial version of the Informatica Big Data Edition into the Hortonworks Sandbox. This free trial enables you to download and test out the Big Data Edition on your notebook or spare computer and experience your own personal Modern Data Architecture (MDA).

If you happen to be at Strata this October 2014, please meet us at our booths: Informatica #352 and Hortonworks #117. Don’t forget to participate in our Passport Program and join our session at 5:45 pm ET on Thursday, October 16, 2014.

FacebookTwitterLinkedInEmailPrintShare
Posted in Big Data, Hadoop | Tagged , , | Leave a comment

Increasing Data Complexity Now Calls for Enterprise Data Integration

I had the honour recently of being asked to give the opening keynote presentation at Informatica’s Information Potential Day in London. It was a really well attended event and I enjoyed the discussions during the breaks and over lunch with many Informatica customers.

The presentation I gave was entitled Information Potential in the Information Age.  In it I tried to get several messages over to the audience. The gist was as follows.  We are now in an internet age where the power of the customer is king.  They can compare prices and switch loyalties at the click of a mouse or at the touch of a screen on a mobile device. Competition for the wallet share is coming from everywhere with new web businesses rewiring some industries. Therefore it’s not surprising that the recent 16th annual survey of CEOs by PwC showed that customer growth, retention and loyalty was top of the agenda along with the need to improve operational effectiveness.

These business priorities are driving new information requirements. On the customer front there is a need for new data sources for deeper customer insight, a trusted 360° view of customer, integrated customer master data, and a customer oriented data warehouse.   Operational effectiveness, on the other hand requires on-demand information services, lower latency data, event-driven data integration, stream processing and decision management as well as operational BI at the point of need.  Almost all of these requirements are dependent on one thing -  data integration.

Yet, having been in this industry and in data management for over 32 years now, I can’t remember a time when I have seen data so widely distributed. And it is increasing.  We have multiple instances of OLTP applications in different geographies or business units, with data being sent everywhere. Also in the analytical world, more and more data warehouse appliances are appearing creating ‘islands’ of analytical data. Data is now in the cloud as well as on premise and the arrival of Big Data is adding more platforms such as NoSQL DBMSs and Hadoop into the mix. Also unstructured content is still scattered across the enterprise and with new data sources like social media data, machine generated clickstream, GPS and sensor data now upon us, we are facing a data deluge.

So while we have the potential there to act on deeper insights to improve customer growth and operational effectiveness, the only way we can produce this information is to integrate data. This need is now everywhere. We have made progress over the years in some areas. Data warehousing and master data management both require data integration. Cloud computing also needs it to get data out to cloud applications or collect it from these applications. Big data needs it too. We hand-coded data integration 20 years ago before the emergence of data integration software. My question with the advent of Big Data is why is it ‘cool’ to hand-write code again to do this just because it is on new technology platforms like Hadoop. Surely we could be more productive leveraging existing investments in data integration as long as they extend their support into the Big Data arena.  Oh and before I forget we are already at the point where the business user needs data integration.  The emergence of multiple data warehouses in organizations has meant that almost ever business analyst I know using a self-service BI tool to produce insights and dashboards needs to integrate data from multiple underlying data sources.

So while we have made progress, we have often implemented is data integration on a project-by-project basis whether it be for a data warehouse, for a master data entity, for cloud or for big data, let alone everything else we need it for.  However, the danger here is that we miss a major opportunity to re-use metadata whereby we define data names and data integration mappings once and then use them to provision data to wherever it is needed. To do that we need common metadata, a common data management platform and enterprise data integration. We are surely at the point where the need for enterprise data integration is now upon us.

——————————————

MikeFerguson2011Mike Ferguson is an independent analyst specializing in BI/Analytics, Data Management and Big Data. He can be reached at mferguson@intelligentbusiness.biz

FacebookTwitterLinkedInEmailPrintShare
Posted in Uncategorized | Leave a comment

What’s the biggest obstacle to influencing traveler purchasing decisions in the hospitality industry?

Did you see the disturbing traveler survey results in theHotelExecutive.com article “The Evolution of Hotel Loyalty Programs: What Guests Expect in 2013”? It cited a study by Deloitte, which revealed that 30% of hotel loyalty program members are “at risk” of switching their preferred brand.

Also, although 50% of leisure travelers are members of a hotel loyalty program, they are not loyal to one brand. Sadly, most hotel loyalty programs have little or no impact on traveler purchasing decisions.

amberleaf graphic

Less than 8% of respondents to a travel survey indicated they stay at the same hotel brand.

Why? Most hotel loyalty programs are not differentiated. Guest segmentation is unsophisticated. For example, a guest who stays 11 nights is treated the same way as someone who stays 49 nights because they are categorized under the same tier of loyalty. However, if a guest who stays 20 nights reserves a higher cost room than a similar guest, should you treat them differently?

Based on my 15 years of experience working with sales and marketing executives at leading hotels, resorts, and other hospitality organizations, I see a huge opportunity for those who want to increase market share. The key is to build a differentiated loyalty program, based on more refined and meaningful segments, which anticipate travelers’ personalized needs and allow you to deliver a consistent experience across all touchpoints.

So, what is holding sales and marketing back from creating more refined and meaningful segments for their loyalty programs? It’s not the applications that stand in their way. Most have implemented business intelligence, CRM systems, sales automation, service automation, and marketing automation.

The biggest obstacle is managing hospitality customer data.  It’s distributed across the enterprise:  locally at the hotels, in SaaS environments, or at the corporate data center. Take a look at this complexity:

  • 1,000,000s of Guests
  • 1,000s of Business Contacts
  • 1,000s of Corporate Client Companies
  • 1,000s of Meeting Planners
  • 100s of Meeting Planning Companies
  • 100s of Travel Agencies
  • 1,000s of Travel Agents
  • 100s of Brands
  • 1,000s of Properties
  • 1,000,000,000s of Interactions
  • 1,000,000s of Transactions

This is an exponentially complex problem that can’t be managed in documents. Without a trusted, unified view into the amazingly complex network of hospitality customers, you can’t segment in a more refined and meaningful way.

Even though your valuable customer data is disconnected today, you can take steps to start connecting it to create a unified customer view. Others are already doing it. See Hyatt’s presentation at Informatica World for an example of how one innovative global hospitality organization mastered their hospitality customer data to power their sales and marketing vision. Hyatt implemented data integration, data quality, and master data management (MDM) solutions to gain a unified customer view.

With this approach, you can:

  • Take disparate information for guests, corporate customers, and other customer types to create a trusted unified view of hospitality customers
  • Manage business customer account hierarchies and household relationships so sales and marketing teams can visualize interactions between customers. For example, perhaps some of your top 100 business customers are also your top 100 leisure guests.
  • Integrate this customer repository with transaction data like stays, reservations, events, food and beverage, customer service, web behavior, and social media to gain a trusted longitudinal view of the customer experience—from reservation through check-out.
  • Create more refined and meaningful segments to gain key customer insights and better anticipate travelers’ personalized needs, and deliver a consistent experience across all touch points.

Are you interested in understanding more about this proven way to create a differentiated loyalty program and increase revenue and market share? If so, please leave a comment below and I will respond.

——————————————————————————————-

photo

Larry Goldman is the president of AmberLeaf, a customer intelligence consultancy. With18 years of experience helping clients make strategic use of their information, Larry excels at helping sales and marketing teams implement best practices for marketing strategy, customer analytics, customer segmentation, data warehousing, and database marketing. He has delivered business-value focused solutions for Business Intelligence, CRM, Master Data Management (MDM), and Database Marketing.

In the past four years, Larry has focused on the hospitality industry. His industry expertise also includes Telecommunications (Wireless and Long Distance), High Technology, Broadcasting, Newspaper, On-Line Media Content providers, Cable, Telematics (In-Car Communications and Services), Education, Manufacturing, Retail (catalog, e-commerce, brick and mortar), Consumer Packaged Goods, and Financial Services (Insurance, Brokerage, Mortgage). Larry is an author and popular speaker on the topic of customer centricity and the use of analytics to improve business results.

 

FacebookTwitterLinkedInEmailPrintShare
Posted in Uncategorized | Leave a comment

Informatica and Heiler – Better Together

Access to information has always been extremely important to people and organizations. In an increasingly complex and interconnected world, data is an essential competitive advantage for companies. With rapidly growing data volumes, increased complexity, and high market speed, our goal is simple: to easily connect people and data.

Turning data into business outcomes has always been our value proposition at Heiler Software. Using the value of data and information potential is totally in line with the positioning of Informatica. Unleashing the potential of information will help to make the careers of our customers, partners and employees even better.

From beginning of all conversations, from the announcement of the acquisition in October 2012 until today, Informatica’s managers and employees always stuck to the promise they made. That is a great commitment for our employees and customers. Now Informatica has announced the exciting news that the acquisition of Heiler is completed.

Heiler is now a part of the Informatica family. Our entire team is looking forward to the future with Informatica. For Heiler, the door for an exciting and successful future is wide open. Informatica will provide our customers and our employees a promising perspective in a dynamic industry.

Hundreds of customers rely on Informatica’s multi-domain MDM platform to manage customer, location, and asset data, and synchronize accurate master data across operational and analytic systems. I am sure Informatica is committed to being a trusted partner and will work to ensure success with all Heiler’s products.

Heiler has just released PIM 7 to speed up the time to market with all products, across all sales and marketing channels. Also, since March 2013, Procurement 7.1 is available. Informatica is known for innovation. I am convinced that Informatica will continue investing in our business. Their goal is to generate real-time commerce business processes and create a unique customer experience for our customers’ business. Our award winning PIM fits in the Universal MDM strategy to deliver to one vision: Enabling our customers to offer the right product, for the right customer, from the right supplier, at the right time, via the right channels and locations. It is all about inspiring.

Joining the forces will allow our customers to leverage Informatica’s expertise in data quality and data integration to deliver greater business value. With Informatica’s Data Quality offerings, our customers will be able to further accelerate the introduction of your products to market. Additionally, customers will be able to easily onboard data from their suppliers, then distribute to its customers and partners electronically with Informatica B2B. We share a common goal to establish the combination of Informatica MDM and Heiler PIM as the gold standard in the industry.

Another benefit of the acquisition is that all customers will receive world-class support from Informatica’s Global Customer Support organization, which delivers a comprehensive set of support programs including 24×7 support across 10 regional support centers. Customers have ranked Informatica as #1 in customer satisfaction for seven years in a row. In addition, Informatica’s strong global partner ecosystem brings the right resources to solve business and technical challenges across more industries.

By reaching this important milestone my mission as CEO of Heiler Software AG will be fulfilled. Personally, I’m going to stay connected to Informatica and I am excited to get involved in the future of this excellent and innovative company.

The future of Universal MDM is close to my heart.

———————————————————-

Rolf-Heiler

 

Rolf J. Heiler, born 1959, married, three children, graduated in 1982 in Business management, majoring in IT and process organization. In 1987, Rolf Heiler founded Heiler Software GmbH. From 2000 Heiler Software was quoted on the stock exchange in 2000 in the “New Market” sector.

FacebookTwitterLinkedInEmailPrintShare
Posted in Uncategorized | Tagged , , , , | Leave a comment

Title I HOPE at Informatica World 2013

Title I HOPE’s (Homeless Outreach Program for Education) mission is to work to identify and enroll homeless students in the Clark County School District, collaborate with school personnel on homeless educational rights, and inform parents of the options available to their children under the McKinney-Vento Homeless Education Act.  There have been over 6,800 students K-12th grade identified as homeless in our community.

The HOPE office strives to connect youth with resources and support services that will keep them in school.  Services that are needed for our students range from school supplies to tutoring/mentoring to food.  The HOPE staff partners with community providers to assist children with backpacks, food, clothing, shoes and hygiene items.  The donations prepare students with basic needs which encourage them to come to school ready to learn.

Informatica World has been such a gracious contributor to the HOPE program.  Last year the organization donated 1,000 food bags which ensured that all of our students who were tutored had a snack so they could focus on their work. You can check out the news story here about our great results.

Informatica also provided us with six iPads which were integrated into the A Place Called HOPE high school resource centers.  Youth were able to have experience of utilizing the latest in technology because of Informatica World.  The support and supplies provided by the organization and the participants of the conference will make a true difference in the lives of students who are struggling with basis needs.  The items donated will assist student in focusing on their education and will help them to be successful in life.

The Informatica World 2013 event this week is truly making a difference in the lives of thousands of homeless students in our community.  Title I HOPE thanks everyone for their generosity and willingness to give.

————————————

This was a guest blog penned by Title I HOPE.

FacebookTwitterLinkedInEmailPrintShare
Posted in Uncategorized | 3 Comments

Informatica’s Vibe virtual data machine can streamline big data work and allow data scientists to be more efficient

Informatica introduced an embeddable Vibe engine for not only transformation, but also for data quality, data profiling, data masking and a host of other data integration tasks. It will have a meaningful impact on the data scientist shortage.

Some clear economic facts are already apparent in the current world of data. Hadoop provides a significantly less expensive platform for gathering and analyzing data; cloud computing (potentially) is a more economical computing location than on-premises, if managed well. These are clearly positive developments. On the other hand, the human resources required to exploit these new opportunities are actually quite expensive. When there is greater demand than can be met in the short term for a hot product, suppliers put customers “on allocation” to manage the distribution to the most strategic customers.

This is the situation with “data scientists,” this new breed of experts with quantitative skills, data management skills, presentation skills and deep domain expertise. Current estimates are that there are 60,000 – 120,000 unfilled positions in the US alone. Naturally, data scientists are “allocated” to the most critical (economically lucrative) efforts, and their time is limited to those tasks that most completely leverage their unique skills.

To address this shortage, industry turns to universities to develop curricula to manufacture data scientists, but this will take time. In the meantime, salaries for data scientists are very high. Unfortunately, most data science work involves a great deal of effort that does not require data science skills, especially in the areas of managing the data prior to the insightful analytics. Some estimates are that data scientists spend 50-80% of their time finding and cleaning data, managing their computing platforms and writing programs. Reducing this effort  with better tools can not only make data scientists more effective, it have an impact on the most expensive component of big data – human resources.

Informatica today introduced Vibe, its embeddable virtual data machine to do exactly that. Informatica has, for over 20 years, provided tools that allow developers to design and execute transformation of data without the need for writing or maintaining code. With Vibe, this capability is extended to include data quality, masking and profiling and the engine itself can be embedded in the platforms where the work is performed. In addition, the engine can generate separate code from a single data management design.

In the case of Hadoop, Informatica designers can continue to operate in the familiar design studio, and have Vibe generate the code for whatever platform is needed.In this way, it is possible for an Informatica developer to develop these data management routines for Hadoop, without learning Hadoop or writing code in Java. And the real advantage is that the data scientist is freed from work that can be performed by those in lower pay grades and can parallelize that work too – multiple programmers and integration developers to one data scientist.

Vibe is a major innovation for Informatica that provides many interesting opportunities for it’s customers. Easing the data scientist problem is only one.

———————————

Neil Raden

This is a guest blog penned by Neil Raden, a well-known industry figure as an author, lecturer and practitioner. He has in-depth experience as a developer, consultant and analyst in all areas of Analytics and Decision Services including Big Data strategy and implementation, Business Intelligence, Data Warehousing, Statistical/Predictive Modeling, Decision Management, and IT systems integration including assessment, architecture, planning, project management and execution. Neil has authored dozens of sponsored white papers and articles, blogger and co-author of “Smart Enough) Systems” (Prentice Hall, 2007). He has 25 years as an actuary, software engineer and systems integrator.

FacebookTwitterLinkedInEmailPrintShare
Posted in Big Data, Data Integration, Data masking, Data Quality | Tagged , , , , , , , | Leave a comment