Category Archives: Big Data

Remembering Big Data Gravity – PART 2

I ended my previous blog wondering if awareness of Data Gravity should change our behavior. While Data Gravity adds Value to Big Data, I find that the application of the Value is under explained.

Exponential growth of data has naturally led us to want to categorize it into facts, relationships, entities, etc. This sounds very elementary. While this happens so quickly in our subconscious minds as humans, it takes significant effort to teach this to a machine.

A friend tweeted this to me last week: I paddled out today, now I look like a lobster. Since this tweet, Twitter has inundated my friend and me with promotions from Red Lobster. It is because the machine deconstructed the tweet: paddled <PROPULSION>, today <TIME>, like <PREFERENCE> and lobster <CRUSTACEANS>. While putting these together, the machine decided that the keyword was lobster. You and I both know that my friend was not talking about lobsters.

You may think that this maybe just a funny edge case. You can confuse any computer system if you try hard enough, right? Unfortunately, this isn’t an edge case. 140 characters has not just changed people’s tweets, it has changed how people talk on the web. More and more information is communicated in smaller and smaller amounts of language, and this trend is only going to continue.

When will the machine understand that “I look like a lobster” means I am sunburned?

I believe the reason that there are not hundreds of companies exploiting machine-learning techniques to generate a truly semantic web, is the lack of weighted edges in publicly available ontologies. Keep reading, it will all make sense in about 5 sentences. Lobster and Sunscreen are 7 hops away from each other in dbPedia – way too many to draw any correlation between the two. For that matter, any article in Wikipedia is connected to any other article within about 14 hops, and that’s the extreme. Completed unrelated concepts are often just a few hops from each other.

But by analyzing massive amounts of both written and spoken English text from articles, books, social media, and television, it is possible for a machine to automatically draw a correlation and create a weighted edge between the Lobsters and Sunscreen nodes that effectively short circuits the 7 hops necessary. Many organizations are dumping massive amounts of facts without weights into our repositories of total human knowledge because they are naïvely attempting to categorize everything without realizing that the repositories of human knowledge need to mimic how humans use knowledge.

For example – if you hear the name Babe Ruth, what is the first thing that pops to mind? Roman Catholics from Maryland born in the 1800s or Famous Baseball Player?

data gravityIf you look in Wikipedia today, he is categorized under 28 categories in Wikipedia, each of them with the same level of attachment. 1895 births | 1948 deaths | American League All-Stars | American League batting champions | American League ERA champions | American League home run champions | American League RBI champions | American people of German descent | American Roman Catholics | Babe Ruth | Baltimore Orioles (IL) players | Baseball players from Maryland | Boston Braves players | Boston Red Sox players | Brooklyn Dodgers coaches | Burials at Gate of Heaven Cemetery | Cancer deaths in New York | Deaths from esophageal cancer | Major League Baseball first base coaches | Major League Baseball left fielders | Major League Baseball pitchers | Major League Baseball players with retired numbers | Major League Baseball right fielders | National Baseball Hall of Fame inductees | New York Yankees players | Providence Grays (minor league) players | Sportspeople from Baltimore | Maryland | Vaudeville performers.

Now imagine how confused a machine would get when the distance of unweighted edges between nodes is used as a scoring mechanism for relevancy.

If I were to design an algorithm that uses weighted edges (on a scale of 1-5, with 5 being the highest), the same search would yield a much more obvious result.

data gravity1895 births [2]| 1948 deaths [2]| American League All-Stars [4]| American League batting champions [4]| American League ERA champions [4]| American League home run champions [4]| American League RBI champions [4]| American people of German descent [2]| American Roman Catholics [2]| Babe Ruth [5]| Baltimore Orioles (IL) players [4]| Baseball players from Maryland [3]| Boston Braves players [4]| Boston Red Sox players [5]| Brooklyn Dodgers coaches [4]| Burials at Gate of Heaven Cemetery [2]| Cancer deaths in New York [2]| Deaths from esophageal cancer [1]| Major League Baseball first base coaches [4]| Major League Baseball left fielders [3]| Major League Baseball pitchers [5]| Major League Baseball players with retired numbers [4]| Major League Baseball right fielders [3]| National Baseball Hall of Fame inductees [5]| New York Yankees players [5]| Providence Grays (minor league) players [3]| Sportspeople from Baltimore [1]| Maryland [1]| Vaudeville performers [1].

Now the machine starts to think more like a human. The above example forces us to ask ourselves the relevancy a.k.a. Value of the response. This is where I think Data Gravity’s becomes relevant.

You can contact me on twitter @bigdatabeat with your comments.

FacebookTwitterLinkedInEmailPrintShare
Posted in Architects, Big Data, Cloud, Cloud Data Management, Data Aggregation, Data Archiving, Data Governance, General, Hadoop | Tagged , , , , , , | Leave a comment

Can Big Data Live Up To Its Promise For Retailers?

Can Big Data Live Up To Its Promise For Retailers?

Can Big Data Live Up To Its Promise For Retailers?

Can Big Data Live Up To Its Promise For Retailers? [/caption]Was Sam Walton, founder of Walmart talking about Big Data and Analytic in the 90’s when he said, “People think we got big by putting big stores in small towns. Really, we got big by replacing inventory with information.”  Walmart clearly understood the value of the large volumes of data they had access to and turned it into competitive advantage.

As retailers move from looking in the rear view mirror (what happened) to the road ahead (what will happen) they have turned to Big Data and Analytics for answers. While, Big Data holds great promise for retailers, many are skeptical. Retailers are already drinking from the data fire hose, whether its transaction data, recording every product sold to every customer across all channels or research data, covering detailed consumer profiles or web log and social data. The questions retailers are asking; will the investment drive more revenues, increase customer loyalty and create a more rewarding customer experience? Will I gain a deeper insight into customer transactions and interactions across the organization? Can we use existing resources and infrastructure?

The answer is Yes, Big Data presents the opportunity to better analyse everything from customer shopping behaviors at each stage of purchase journey, to inventory planning to delivering relevant and personalized offers. By analyzing how shoppers found your products, how long they spend browsing product pages and which products they added to their basket provides greater insight into what decision process they went through before purchase and helps retailers quickly identify cross sell and up-sell opportunities in real-time. In addition, combining transaction data and what your customers are saying on social channels (ratings, likes, dislikes, what’s trending etc.) can feed into the decisions you make on placing the right product, in the right store at the right price and ultimately deliver very personalize and contextual offers to the customers.

Data Driven Decisions Getting value from Big Data

Turning Big Data into actionable insight is not just about dumping data in to a “Data Lake” and pointing an analytics tool at it and saying job done!  Retailers need to take a number of steps to profit from Big Data and Analytics.

  • Firstly, you need to gather data from all available sources in batch or real-time, from internal and external, and from an ever increasing number of devices (beacons, mobile devices). Once you have gathered the data, it needs to be connected, validated, cleansed and a governance process put in place before integrating with analytic tools and systems.
  • Secondly, put clean and trusted data in the hands of data scientists who can distill the relevant from irrelevant and formulate commercial insights that the business can action and profit from it.
  • Lastly, plan and organize for success. IT and business need to align behind the same agenda, regularly reviewing business priorities and adjusting as needed. Maximize existing scare IT resources by leveraging existing technologies, Cloud platforms and forming alliances with 3rd party vendors to fill skills gap. Secure quick wins for your Big Data initiatives; maybe start with integrating historical transaction data with real-time purchase data to make personalized offers at point of sale. Look outside your organization and to other industries like retail banking or telecommunications and learn from their successes and failures.

With the right approach, Big Data will deliver the return on investment for retailers.

FacebookTwitterLinkedInEmailPrintShare
Posted in Big Data, Retail | Tagged | Leave a comment

The ERP Data Trap

The ERP Data Trap

The ERP Data Trap

ERP systems were a true competitive advantage 20+ years ago, but not so today. ERP systems are a tool that gave people the best view into their business, but that is when there really were only ERP systems and Databases, but today that critical data resides in so many other areas. There are several reasons why ERP systems act as a data trap: technical factors, out of date management theory, and big data trends.  First, let’s talk about management theory.

There are two fundamental concepts that have been driving much of the strategic planning in modern organizations in recent decades.  The idea of economies of scale is deeply embedded in our thinking. The concept was first introduced by Adam Smith in the 18th century and reinforced throughout the 20th century by contemporaries such as Bruce Henderson. In 1968 Henderson wrote “”Costs characteristically decline by 20-30% in real terms each time accumulated experience doubles.“  The basic idea is that bigger is better. (more…)

FacebookTwitterLinkedInEmailPrintShare
Posted in Architects, Big Data, Enterprise Data Management, Integration Competency Centers | Tagged , , | 1 Comment

Big Data Driving Data Integration at the NIH

Big Data Driving Data Integration at the NIH

Big Data Driving Data Integration at the NIH

The National Institutes of Health announced new grants to develop big data technologies and strategies.

“The NIH multi-institute awards constitute an initial investment of nearly $32 million in fiscal year 2014 by NIH’s Big Data to Knowledge (BD2K) initiative and will support development of new software, tools and training to improve access to these data and the ability to make new discoveries using them, NIH said in its announcement of the funding.”

The grants will address issues around Big Data adoption, including:

  • Locating data and the appropriate software tools to access and analyze the information.
  • Lack of data standards, or low adoption of standards across the research community.
  • Insufficient polices to facilitate data sharing while protecting privacy.
  • Unwillingness to collaborate that limits the data’s usefulness in the research community.

Among the tasks funded is the creation of a “Perturbation Data Coordination and Integration Center.”  The center will provide support for data science research that focuses on interpreting and integrating data from different data types and databases.  In other words, it will make sure the data moves to where it should move, in order to provide access to information that’s needed by the research scientist.  Fundamentally, it’s data integration practices and technologies.

This is very interesting from the standpoint that the movement into big data systems often drives the reevaluation, or even new interest in data integration.  As the data becomes strategically important, the need to provide core integration services becomes even more important.

The project at the NIH will be interesting to watch, as it progresses.  These are the guys who come up with the new paths to us being healthier and living longer.  The use of Big Data provides the researchers with the advantage of having a better understanding of patterns of data, including:

  • Patterns of symptoms that lead to the diagnosis of specific diseases and ailments.  Doctors may get these data points one at a time.  When unstructured or structured data exists, researchers can find correlations, and thus provide better guidelines to physicians who see the patients.
  • Patterns of cures that are emerging around specific treatments.  The ability to determine what treatments are most effective, by looking at the data holistically.
  • Patterns of failure.  When the outcomes are less than desirable, what seems to be a common issue that can be identified and resolved?

Of course, the uses of big data technology are limitless, when considering the value of knowledge that can be derived from petabytes of data.  However, it’s one thing to have the data, and another to have access to it.

Data integration should always be systemic to all big data strategies, and the NIH clearly understands this to be the case.  Thus, they have funded data integration along with the expansion of their big data usage.

Most enterprises will follow much the same path in the next 2 to 5 years.  Information provides a strategic advantage to businesses.  In the case of the NIH, it’s information that can save lives.  Can’t get much more important than that.

FacebookTwitterLinkedInEmailPrintShare
Posted in Big Data, Cloud, Cloud Data Integration, Data Integration | Tagged , , , | Leave a comment

If Data Projects Weather, Why Not Corporate Revenue?

Every fall Informatica sales leadership puts together its strategy for the following year.  The revenue target is typically a function of the number of sellers, the addressable market size and key accounts in a given territory, average spend and conversion rate given prior years’ experience, etc.  This straight forward math has not changed in probably decades, but it assumes that the underlying data are 100% correct. This data includes:

  • Number of accounts with a decision-making location in a territory
  • Related IT spend and prioritization
  • Organizational characteristics like legal ownership, industry code, credit score, annual report figures, etc.
  • Key contacts, roles and sentiment
  • Prior interaction (campaign response, etc.) and transaction (quotes, orders, payments, products, etc.) history with the firm

Every organization, no matter if it is a life insurer, a pharmaceutical manufacturer, a fashion retailer or a construction company knows this math and plans on getting somewhere above 85% achievement of the resulting target.  Office locations, support infrastructure spend, compensation and hiring plans are based on this and communicated.

data revenue

We Are Not Modeling the Global Climate Here

So why is it that when it is an open secret that the underlying data is far from perfect (accurate, current and useful) and corrupts outcomes, too few believe that fixing it has any revenue impact?  After all, we are not projecting the climate for the next hundred years here with a thousand plus variables.

If corporate hierarchies are incorrect, your spend projections based on incorrect territory targets, credit terms and discount strategy will be off.  If every client touch point does not have a complete picture of cross-departmental purchases and campaign responses, your customer acquisition cost will be too high as you will contact the wrong prospects with irrelevant offers.  If billing, tax or product codes are incorrect, your billing will be off.  This is a classic telecommunication example worth millions every month.  If your equipment location and configuration is wrong, maintenance schedules will be incorrect and every hour of production interruption will cost an industrial manufacturer of wood pellets or oil millions.

Also, if industry leaders enjoy an upsell ratio of 17%, and you experience 3%, data (assuming you have no formal upsell policy as it violates your independent middleman relationship) data will have a lot to do with it.

The challenge is not the fact that data can create revenue improvements but how much given the other factors: people and process.

Every industry laggard can identify a few FTEs who spend 25% of their time putting one-off data repositories together for some compliance, M&A customer or marketing analytics.  Organic revenue growth from net-new or previously unrealized revenue is what the focus of any data management initiative should be.  Don’t get me wrong; purposeful recruitment (people), comp plans and training (processes) are important as well.  Few people doubt that people and process drives revenue growth.  However, few believe data being fed into these processes has an impact.

This is a head scratcher for me. An IT manager at a US upstream oil firm once told me that it would be ludicrous to think data has a revenue impact.  They just fixed data because it is important so his consumers would know where all the wells are and which ones made a good profit.  Isn’t that assuming data drives production revenue? (Rhetorical question)

A CFO at a smaller retail bank said during a call that his account managers know their clients’ needs and history. There is nothing more good data can add in terms of value.  And this happened after twenty other folks at his bank including his own team delivered more than ten use cases, of which three were based on revenue.

Hard cost (materials and FTE) reduction is easy, cost avoidance a leap of faith to a degree but revenue is not any less concrete; otherwise, why not just throw the dice and see how the revenue will look like next year without a central customer database?  Let every department have each account executive get their own data, structure it the way they want and put it on paper and make hard copies for distribution to HQ.  This is not about paper versus electronic but the inability to reconcile data from many sources on paper, which is a step above electronic.

Have you ever heard of any organization move back to the Fifties and compete today?  That would be a fun exercise.  Thoughts, suggestions – I would be glad to hear them?

FacebookTwitterLinkedInEmailPrintShare
Posted in Banking & Capital Markets, Big Data, Business Impact / Benefits, Business/IT Collaboration, Data Governance, Data Integration, Data Quality, Data Warehousing, Enterprise Data Management, Governance, Risk and Compliance, Master Data Management, Product Information Management | Tagged , | 1 Comment

Has Hadoop Crossed The Chasm? Thoughts About Strata 2014

Well, it’s been a little over a week since the Strata conference so I thought I should give some perspective on what I learned.  I think it was summed up at my first meeting, on the first morning of the conference. The meeting was with a financial services company who has significance experience with Hadoop. The first words out of their mouths were, “Hadoop is hard.” 

Later in the conference, after a Western Union representative spoke about their Hadoop deployment, they were mobbed by end user questions and comments. The audience was thrilled to hear about an actual operational deployment: Not just a sandbox deployment, but an actual operational Hadoop deployment from a company that is over 160 years old.

The market is crossing the chasm from early adopters who love to hand code (and the macho culture of proving they can do the hard stuff) to more mainstream companies that want to use technology to solve real problems. These mainstream companies aren’t afraid to admit that it is still hard. For the early adopters, nothing is ever hard. They love hard. But the mainstream market doesn’t view it that way.  They don’t want to mess around in the bowels of enabling technology.  They want to use the technology to solve real problems.  The comment from the financial services company represents the perspective of the vast majority of organizations. It is a sign Hadoop is hitting the mainstream market.

More proof we have moved to a new phase?  Cloudera announced they were going from shipping six versions a year down to just three.  I have been saying for awhile that we will know that Hadoop is real when the distribution vendors stop shipping every 2 months and go to a more typical enterprise software release schedule.  It isn’t that Hadoop engineering efforts have slowed down.  It is still evolving very rapidly.  It is just that real customers are telling the Hadoop suppliers that they won’t upgrade as fast because they have real business projects running and they can’t do it.  So for those of you who are disappointed by the “slow down,” don’t be.  To me, this is news that Hadoop is reaching critical mass.

Technology is closing the gap to allow organizations to use Hadoop as a platform without having to actually have an army of Hadoop experts.  That is what Informatica does for data parsing, data integration,  data quality and data lineage (recent product announcement).  In fact, the number one demo at the Informatica booth at Strata was the demonstration of “end to end” data lineage for data, going from the original source all the way to how it was loaded and then transformed within Hadoop.  This is purely an enterprise-class capability that becomes more interesting and important when you actually go into true production.

Informatica’s goal is to hide the complexity of Hadoop so companies can get on with the work of using the platform with the skills they already have in house.  And from what I saw from all of the start-up companies that were doing similar things for data exploration and analytics and all the talk around the need for governance, we are finally hitting the early majority of the market.  So, for those of you who still drop down to the underlying UNIX OS that powers a Mac, the rest of us will keep using the GUI.   To the extent that there are “fit for purpose” GUIs on top of Hadoop, the technology will get used by a much larger market.

So congratulations Hadoop, you have officially crossed the chasm!

P.S. See me on theCUBE talking about a similar topic at: youtu.be/oC0_5u_0h2Q

FacebookTwitterLinkedInEmailPrintShare
Posted in Banking & Capital Markets, Big Data, Hadoop, Informatica Events | Tagged , , , | Leave a comment

Fast and Fasterer: Screaming Streaming Data on Hadoop

Hadoop

Guest Post by Dale Kim

This is a guest blog post, written by Dale Kim, Director of Product Marketing at MapR Technologies.

Recent published research shows that “faster” is better than “slower.” The point, ladies and gentlemen, is that speed, for lack of a better word, is good. But granted, you won’t always have the need for speed. My Lamborghini is handy when I need to elude the Bakersfield fuzz on I-5, but it does nothing for my Costco trips. There, I go with capacity and haul home my 30-gallon tubs of ketchup with my Ford F150. (Note: this is a fictitious example, I don’t actually own an F150.)

But if speed is critical, like in your data streaming application, then Informatica Vibe Data Stream and the MapR Distribution including Apache™ Hadoop® are the technologies to use together. But since Vibe Data Stream works with any Hadoop distribution, my discussion here is more broadly applicable. I first discussed this topic earlier this year during my presentation at Informatica World 2014. In that talk, I also briefly described architectures that include streaming components, like the Lambda Architecture and enterprise data hubs. I recommend that any enterprise architect should become familiar with these high-level architectures.

Data streaming deals with a continuous flow of data, often at a fast rate. As you might’ve suspected by now, Vibe Data Stream, based on the Informatica Ultra Messaging technology, is great for that. With its roots in high speed trading in capital markets, Ultra Messaging quickly and reliably gets high value data from point A to point B. Vibe Data Stream adds management features to make it consumable by the rest of us, beyond stock trading. Not surprisingly, Vibe Data Stream can be used anywhere you need to quickly and reliably deliver data (just don’t use it for sharing your cat photos, please), and that’s what I discussed at Informatica World. Let me discuss two examples I gave.

Large Query Support. Let’s first look at “large queries.” I don’t mean the stuff you type on search engines, which are typically no more than 20 characters. I’m referring to an environment where the query is a huge block of data. For example, what if I have an image of an unidentified face, and I want to send it to a remote facial recognition service and immediately get the identity? The image would be the query, the facial recognition system could be run on Hadoop for fast divide-and-conquer processing, and the result would be the person’s name. There are many similar use cases that could leverage a high speed, reliable data delivery system along with a fast processing platform, to get immediate answers to a data-heavy question.

Data Warehouse Onload. For another example, we turn to our old friend the data warehouse. If you’ve been following all the industry talk about data warehouse optimization, you know pumping high speed data directly into your data warehouse is not an efficient use of your high value system. So instead, pipe your fast data streams into Hadoop, run some complex aggregations, then load that processed data into your warehouse. And you might consider freeing up large processing jobs from your data warehouse onto Hadoop. As you process and aggregate that data, you create a data flow cycle where you return enriched data back to the warehouse. This gives your end users efficient analysis on comprehensive data sets.

Hopefully this stirs up ideas on how you might deploy high speed streaming in your enterprise architecture. Expect to see many new stories of interesting streaming applications in the coming months and years, especially with the anticipated proliferation of internet-of-things and sensor data.

To learn more about Vibe Data Stream you can find it on the Informatica Marketplace .


 

FacebookTwitterLinkedInEmailPrintShare
Posted in Big Data, Business Impact / Benefits, Data Services, Hadoop | Tagged , , , , | Leave a comment

Ebola: Why Big Data Matters

Ebola: Why Big Data Matters

Ebola: Why Big Data Matters

The Ebola virus outbreak in West Africa has now claimed more than 4,000 lives and has entered the borders of the United States. While emergency response teams, hospitals, charities, and non-governmental organizations struggle to contain the virus, could big data analytics help?

A growing number of Data Scientists believe so.

If you recall the Cholera outbreak of Haiti in 2010 after the tragic earthquake, a joint research team from Karolinska Institute in Sweden and Columbia University in the US analyzed calling data from two million mobile phones on the Digicel Haiti network. This enabled the United Nations and other humanitarian agencies to understand population movements during the relief operations and during the subsequent cholera outbreak. They could allocate resources more efficiently and identify areas at increased risk of new cholera outbreaks.

Mobile phones, widely owned even in the poorest countries in Africa. Cell phones are also a rich source of data irrespective of which region where other reliable sources are sorely lacking. Senegal’s Orange Telecom provided Flowminder, a Swedish non-profit organization, with anonymized voice and text data from 150,000 mobile phones. Using this data, Flowminder drew up detailed maps of typical population movements in the region.

Today, authorities use this information to evaluate the best places to set up treatment centers, check-posts, and issue travel advisories in an attempt to contain the spread of the disease.

The first drawback is that this data is historic. Authorities really need to be able to map movements in real time especially since people’s movements tend to change during an epidemic.

The second drawback is, the scope of data provided by Orange Telecom is limited to a small region of West Africa.

Here is my recommendation to the Centers for Disease Control and Prevention (CDC):

  1. Increase the area for data collection to the entire region of Western Africa which covers over 2.1 million cell-phone subscribers.
  2. Collect mobile phone mast activity data to pinpoint where calls to helplines are mostly coming from, draw population heat maps, and population movement. A sharp increase in calls to a helpline is usually an early indicator of an outbreak.
  3. Overlay this data over censuses data to build up a richer picture.

The most positive impact we can have is to help emergency relief organizations and governments anticipate how a disease is likely to spread. Until now, they had to rely on anecdotal information, on-the-ground surveys, police, and hospital reports.

FacebookTwitterLinkedInEmailPrintShare
Posted in B2B Data Exchange, Big Data, Business Impact / Benefits, Business/IT Collaboration, Data Governance, Data Integration | Tagged , , , , , , , | Leave a comment

Informatica’s Hadoop Connectivity Reaches for the Clouds

The Informatica Cloud team has been busy updating connectivity to Hadoop using the Cloud Connector SDK.  Updated connectors are available now for Cloudera and Hortonworks and new connectivity has been added for MapR, Pivotal HD and Amazon EMR (Elastic Map Reduce).

Informatica Cloud’s Hadoop connectivity brings a new level of ease of use to Hadoop data loading and integration.  Informatica Cloud provides a quick way to load data from popular on premise data sources and apps such as SAP and Oracle E-Business, as well as SaaS apps, such as Salesforce.com, NetSuite, and Workday, into Hadoop clusters for pilots and POCs.  Less technical users are empowered to contribute to enterprise data lakes through the easy-to-use Informatica Cloud web user interface.

Hadoop

Informatica Cloud’s rich connectivity to a multitude of SaaS apps can now be leveraged with Hadoop.  Data from SaaS apps for CRM, ERP and other lines of business are becoming increasingly important to enterprises. Bringing this data into Hadoop for analytics is now easier than ever.

Users of Amazon Web Services (AWS) can leverage Informatica Cloud to load data from SaaS apps and on premise sources into EMR directly.  Combined with connectivity to Amazon Redshift, Informatica Cloud can be used to move data into EMR for processing and then onto Redshift for analytics.

Self service data loading and basic integration can be done by less technical users through Informatica Cloud’s drag and drop web-based user interface.  This enables more of the team to contribute to and collaborate on data lakes without having to learn Hadoop.

Bringing the cloud and Big Data together to put the potential of data to work – that’s the power of Informatica in action.

Free trials of the Informatica Cloud Connector for Hadoop are available here: http://www.informaticacloud.com/connectivity/hadoop-connector.html

FacebookTwitterLinkedInEmailPrintShare
Posted in Big Data, Data Services, Hadoop, SaaS | Tagged , , , | Leave a comment

Go On, Flip Your Division of Labor: More Time Analyzing and Less Time Prepping Data

Are you in Sales Operations, Marketing Operations, Sales Representative/Manager, or Marketing Professional? It’s no secret that if you are, you benefit greatly from the power of performing your own analysis, at your own rapid pace. When you have a hunch, you can easily test it out by visually analyzing data in Tableau without involving IT. When you are faced with tight timeframes in which to gain business insight from data, being able to do it yourself in the time you have available and without technical roadblocks makes all the difference.

Self-service Business Intelligence is powerful!  However, we all know it can be even more powerful. When needing to put together an analysis, we know that you spend about 80% of your time putting together data, and then just 20% of your time analyzing data to test out your hunch or gain your business insight. You don’t need to accept this anymore. We want you to know that there is a better way!

We want to allow you to Flip Your Division of Labor and allow you to spend more than 80% of your time analyzing data to test out your hunch or gain your business insight and less than 20% of your time putting together data for your Tableau analysis! That’s right. You like it. No, you love it. No, you are ready to run laps around your chair in sheer joy!! And you should feel this way. You now can spend more time on the higher value activity of gaining business insight from the data, and even find copious time to spend with your family. How’s that?

Project Springbok is a visionary new product designed by Informatica with the goal of making data access and data quality obstacles a thing of the past.  Springbok is meant for the Tableau user, a data person would rather spend their time visually exploring information and finding insight than struggling with complex calculations or waiting for IT. Project Springbok allows you to put together your data, rapidly, for subsequent analysis in Tableau. Project Springbok tells you things about your data that even you may not have known. It does it through Intelligent Suggestions that it presents to the User.

Let’s take a quick tour:

  • Project Springbok tells you, that you have a date column and that you likely want to obtain the Year and Quarter for your analysis (Fig 1)., And if you so wish, by a single click, voila, you have your corresponding years and even the quarters. And it all happened in mere seconds. A far cry from the 45 minutes it would have taken a fluent user of Excel to do using VLOOKUPS.

data

                                                                      Fig. 1

VALUE TO A MARKETING CAMPAIGN PROFESSIONAL: Rapidly validate and accurately complete your segmentation list, before you analyze your segments in Tableau. Base your segments on trusted data that did not take you days to validate and enrich.

  • Then Project Springbok will tell you that you have two datasets that could be joined on a common key, email for example, in each dataset, and would you like to move forward and join the datasets (Fig 2)? If you agree with Project Springbok’s suggestion, voila, dataset joined in a mere few seconds. Again, a far cry from the 45 minutes it would have taken a fluent user of Excel to do using VLOOKUPS.

Data

  Fig. 2

VALUE TO A SALES REPRESENTATIVE OR SALES MANAGER: You can now access your Salesforce.com data (Fig 3) and effortlessly combine it with ERP data to understand your true quota attainment. Never miss quota again due to a revenue split, be it territory or otherwise. Best of all, keep your attainment datatset refreshed and even know exactly what datapoint changed when your true attainment changes.

Data

Fig. 3

  • Then, if you want, Project Springbok will tell you that you have emails in the dataset, which you may or may not have known, but more importantly it will ask you if you wish to determine which emails can actually be mailed to. If you proceed, not only will Springbok check each email for correct structure (Fig 4), but will very soon determine if the email is indeed active, and one you can expect a response from. How long would that have taken you to do?

VALUE TO A TELESALES REPRESENTATIVE OR MARKETING EMAIL CAMPAIGN SPECIALIST : Ever thought you had a great email list and then found out most emails bounced? Now, confidently determine which emails are truly ones will be able to email to, before you send the message. Email prospects who you know are actually at the company and be confident you have their correct email addresses. You can then easily push the dataset into Tableau to analyze the trends in email list health.

Data

Fig. 4

 And, in case you were wondering, there is no training or install required for Project Springbok. The 80% of your time you used to spend on data preparation is now shrunk considerably, and this is after using only a few of Springbok’s capabilities. One more thing: You can even directly export from Project Springbok into Tableau via the “Export to Tableau TDE” menu item (Fig 5).  Project Springbok creates a Tableau TDE file and you just double click on it to open Tableau to test out your hunch or gain your business insight.

Data

Fig. 5

Here are some other things you should know, to convince you that you, too, can only spend no more than 20% of you time on putting together data for your subsequent Tableau analysis:

  • Springbok Sign-Up is Free
  • Springbok automatically finds problems with your data, and lets you fix them with a single click
  • Springbok suggests useful ways for you to combine different datasets, and lets you combine them effortlessly
  • Springbok suggests useful summarizations of your data, and lets you follow through on the summarizations with a single click
  • Springbok allows you to access data from your cloud or on-premise systems with a few clicks, and the automatically keep it refreshed. It will even tell you what data changed from the last time you saw it
  • Springbok allows you to collaborate by sharing your prepared data with others
  • Springbok easily exports your prepared data directly into Tableau for immediate analysis. You do not have to tell Tableau how to interpret the prepared data
  • Springbok requires no training or installation

Go on. Shift your division of labor in the right direction, fast. Sign-Up for Springbok and stop wasting precious time on data preparation. http://bit.ly/TabBlogs

———-

Are you going to be at Dreamforce this week in San Francisco?  Interested in seeing Project Springbok working with Tableau in a live demonstration?  Visit the Informatica or Tableau booths and see the power of these two solutions working hand-in-hand.Informatica is Booth #N1216 and Booth #9 in the Analytics Zone. Tableau is located in Booth N2112.

FacebookTwitterLinkedInEmailPrintShare
Posted in B2B, Big Data, Business Impact / Benefits, Business/IT Collaboration, General | Tagged , , , | Leave a comment