Category Archives: Big Data

IoT Case Study: Fresh Powder = Fresh Data

EpicMix is a website, data integration solution and web application that provides a great example of how companies can provide more value to their customers when they think about data-ready architecture. In this case the company is Vail Resorts and it is great to look at this as an IoT case study since the solution has been in use since 2010.

The basics of EpicMix

* RFID technology embedded into lift tickets provide the ability to collect data for anyone using one at any Vail managed. Vail realized they had all these lift tickets being worn and there was an opportunity to use them to collect data that could enhance the experience of their guests. It also is a very clever way to collect data on skiers to help drive segmentation and marketing decisions.

* EpicMix just works. If any guest wants to take advantage all they have to do is register on the website or download the mobile app for their Android or iOS smart phone and register. Having a low bar to use is important to getting people to try out the app and even if people do not use the EpicMix website or app Vail is still able to leverage the data they are generating to better understand what people do on the mountain. (Vail has a detailed information policy and opt out policy)

IoT Case Study: Fresh Powder = Fresh Data

Figure 1: EpicMix Public Dashboard

* Value added features beyond data visibility. What makes the solution more interesting are the features that go beyond just tracking skiing performance. These include private messaging between guests while on the mountain, sharing photos with friends, integration to personal social media accounts and the ability for people to earn badges and participate in challenges. These go beyond the generation one solution that would just track performance and nothing else.

IoT Case Study: Fresh Powder = Fresh Data

Figure 2: Phone Application EpicMix view

This is the type of solution that qualifies as a IoT Personal Productivity solution and a Business Productivity solution.

  • For the skier it provides information on their activity, communication and sharing information on social media.
  • For Vail it allows them to better understand their guests, better communicate and offer their guests additional services and benefits and also how to use their resources or deploy their employees.

The EpicMix solution was made possible by taking advantage of data that was not being collected and then making it useful to users (skiers & guests). Having used EpicMix and similar performance tracking solutions the added communication and collaboration features are what sets it apart and the ease of use in getting started make it a great example of how fresh data can come from anywhere.

In the future it is easy to imagine features being added that streamlined ordering services for users (table reservation at the restaurant for Apre-ski) or Vail leveraging the data to make business decisions to provide more real time offers to guests on the mountain or frequent visitors on their next visit. And maybe we will see some of the new ski oriented wearables like XON bindings be integrated to solutions like EpicMix so it is possible to get even more data without having to have a second smart phone application.

Information for this post comes from Mapleton Hill Media and Vail Resorts

Share
Posted in Big Data, Cloud Computing, Data Integration | Tagged | Leave a comment

Sports Analytics For Players, Owners and Fans

Sports_Analytics

Sports Analytics For Players, Owners and Fans

In the 2011 film Moneyball Billy Beane introduced to the sports industry how to use data analytics to acquire statistically optimal players for the Oakland A’s. In the last 4 years, advancements in data collection, preparation, aggregation and advanced analytics technology have made it possible to broaden the scope of applying analytics beyond the game and player, drastically change the shape of an industry that has a long history built on tradition.

Last week, MIT Sloan held its 9th annual Sports Analytics Conference in Boston, MA. Amidst the 6 foot snow banks, sports fanatics and data scientists came together at this sold out event to discuss the increasing role of analytics in the sports industry. This year’s conference agenda included topics spanning game statistics and modeling, player contract and salary negotiations, dynamic ticket pricing, referee calls to improving fan experiences.

This latter topic, improving fan experiences, is one that has seen a boost in technology innovation such that data is more readily available for use in analytics. For example, newer NFL stadiums are wifi connected throughout so that fans can watch replays on their devices, tweet, and share selfies during the game. With mobile devices connected to the stadium’s wifi, franchises can drive revenue generating marketing campaigns to their home fan base throughout the game.

More important, however, is the need to keep the Millennial Generation interested in watching games live. In an article posted by TechRepublic, college students are more likely to leave a game during halftime if they are not able to connect to the internet or use social media. Teams need to keep fans in the stadiums so the goal needs to ensure the fan experience in a live venue matches what they can experience at home.

Innovation in advanced analytics and Big Data platforms such as Hadoop gives sports analysts the ability to access significant volumes of detailed data resulting in greater modeling accuracy. Streamlined data preparation tools speed the process from receiving raw data to delivering insight. Advanced analytics offered in the cloud as a service offers team owners and managers access to predictive analytics tools without having to manage and staff large data centers. Better visualization applications provide an effective way to communicate what the data means to those without a math degree.

When applying these innovations to new data sources while combining with advancements of analytics in sports, the results will be game changing far beyond what Billy Beane was able to accomplish with the Okland A’s.

Our congratulations to the winners of the top research papers submitted at the MIT Sloan Sports Analytics conference: Who Is Responsible For A Called Strike? and Counterpoints: Advanced Defensive Metrics for NBA Basketball. It will be interesting to see how these models will make an impact, with Spring Training and March Madness just around the corner. Maybe next year, we will see a submission on the dependencies of atmospheric conditions on football pressure and its impact on the NFL playoffs (PV=NRT) and get a data-driven explanation of Deflate Gate.

Share
Posted in Big Data, Wearable Devices | Tagged , , | Leave a comment

How is Predictive Analytics Driving Business Performance?

QuestionRecently, I got to attend the Predictive Analytics Summit in San Diego. It felt great to be in a room full of data scientists from around the world—all my hidden statistics, operations research, and even modeling background came back to me instantly. I was most interested to learn what this vanguard was doing as well as any lessons learned that could be shared with the broader analytics audience. Presenters ranged from Internet leaders to more traditional companies like Scotts Miracle Gro. Brendan Hodge of Scotts Miracle Gro in fact said, as 125 year old company, he feels like “a dinosaur at a mammal convention”. So in the space that follows, I will share my key take-aways from some of the presenters.

Fei Long from 58.com

58.com is the Craigslist, Yelp, and Monster of China. Fei shared that 58.com is using predictive analytics to recommend resumes to employers and to drive more intelligent real time bidding for its products. Fei said that 58.com has 300 million users—about the number of people in the United States. Most interesting, Fei said that predictive analytics has driven a 10-20% increase in 58.com’s click through rate.

Ian Zhao from eBay

Ian said that eBay is starting to increase the footprint of its data science projects. He said that historical the focus for eBay’s data science was marketing, but today eBay is applying data science to sales and HR. Provost and Fawcett agree in “Data Science for Business” by saying that “the widest applications of data mining techniques are in marketing for tasks such as target marketing, online advertising, and recommendations for cross-selling”.

Ian said that in the non-marketing areas, they are finding a lot less data. The data is scattered across data sources, and requires a lot more cleansing. Ian is using things like time series and ARIMA to look at employee attrition. One thing that Ian found that was particularly interesting is that there is strong correlation between attrition and bonus payouts. Ian said it is critical to leave ample time for data prep. He said that it is important to start the data prep process by doing data exploration and discovery. This includes confirming that data is available for hypothesis testing. Sometimes, Ian said that this the data prep process can include inputting data that is not available in the data set and validating data summary statistics. With this, Ian said that data scientists need to dedicate time and resources for determining what things are drivers. He said with the business, data scientist should talk about likelihood because business people in general do not understand statistics. It is important as well that data scientist ask business people the so what questions. Data scientist should narrow things down to a dollar impact.

Barkha Saxena from Poshmark

Barkha is trying to model the value of user growth. Barkha said that this matters because Poshmark wants to be the #1 community driven marketplace. They want to use data to create a “personal boutique experience”. With 700,000 transactions a day, they are trying to measure customer lifetime value by implementing a cohort analysis. What was the most interesting in Barkha’s data is she discovered repeatable performance across cohorts. In their analysis, different models work better based upon the data—so a lot of time goes into procedurally determining the best model fit.

Meagan Huth from Google

Meagan said that Google is creating something that they call People Analytics. They are trying to make all people decisions by science and data. They want to make it cheaper and easier to work at Google. They have found through their research that good managers lower turnover, increase performance, and increase workplace happiness. The most interesting thing that she says they have found is the best predictor of being a good manager is being a good coach. They have developed predictive models around text threads including those that occur in employee surveys to ensure they have the data to needed to improve.

Hobson Lane from Sharp Labs

Hobson reminded everyone of the importance Nyquist (you need to sample data twice as fast as the fastest data event). This is especially important for organizations moving to the so called Internet of Things. Many of these devices have extremely large data event rates. Hobson, also, discussed the importance of looking at variance against the line that gets drawn in a regression analysis. Sometimes, multiple lines can be drawn. He, also, discussed the problem of not having enough data to support the complexity of the decision that needs to be made.

Ravi Iyer from Ranker

Ravi started by saying Ranker is a Yelp for everyone else. He then discussed the importance of have systematic data. A nice quote from him is as follows:  “better data=better predictions”. Ravi discussed as well the topic of response bias. He said that asking about Coke can lead to different answer when you ask about Coke or Coke at a movie. He discussed interesting how their research shows that millennials are really all about “the best”. I see this happening every time that I take my children out to dinner—there is no longer a cheap dinner out.

Ranjan Sinha at eBay

Ranjan discussed the importance of customer centric commerce and creating predictive models around it. At eBay, they want to optimize the customer experience and improve their ability to make recommendations. eBay is finding customer expectations are changing. For this reason, they want customer context to be modeled by looking at transactions, engagement, intent, account, and inferred social behaviors. With modeling completed, they are using complex event processing to drive a more automated response to data. An amazing example given was for Valentine Day’s where they use a man’s partner’s data to predict the items that the man should get for his significant other.

Andrew Ahn from LinkedIn

Andrew is using analytics to create what he calls an economic graph and to make professionals more productive. One area that he personally is applying predictive analytics to is with LinkedIn’s sales solutions. In LinkedIn Sales Navigator, they display potential customers based upon the sales person’s demographic data—effectively the system makes lead recommendations. However, they want to de-risk this potential interaction for sale professionals and potential customers. Andrews says at the same time that they have found through data analysis that small changes in a LinkedIn profile can lead to big changes. To put this together, they have created something that they call the social selling index. It looks at predictors that they have determined are statistically relevant including member demographic, site engagement, and social network. The SSI score is viewed as a predictive index. Andrew says that they are trying to go from serendipity to data science.

Robert Wilde from Slacker Radio

Robert discussed the importance of simplicity and elegance in model building. He then went through a set of modeling issues to avoid. He said that modelers need to own the discussion of causality and cause and effect and how this can bias data interpretation. In addition, looking at data variance was stressed because what does one do when a line doesn’t have a single point fall on it. Additionally, Robert discussed  what do you do when correlation is strong, weak, or mistaken. Is it X or Y that has the relationship. Or worse yet what do you do when there is coincidental correlation. This led to a discussion of forward and reverse causal inference. For this reason, Robert argued strongly for principal component analysis. This eliminates regression causational bias. At the same time, he suggested that models should be valued by complexity versus error rates.

Parsa Bakhtary from Facebook

Parsa has been looking at what games generate revenue and what games do not generate revenue for Facebook—Facebook amazingly has over 1,000 revenue bearing game. For this reason, Facebook wants to look at the Lifetime Value of Customers for Facebook Games—ithe dollar value of a relationship. Parsa said, however, there is a problem, only 20% pay for their games. Parsa argued that customer life time value (which was developed in the 1950s) doesn’t really work for apps where everyones lifetime is not the same. Additionally, social and mobile gamers are not particularly loyalty. He says that he, therefore, has to model individual games for their first 90 days across all periods of joining and then look at the cumulative revenue curves.

Predictive AnalyticsParting remarks

So we have seen here a wide variety of predictive analytics techniques being used by today’s data scientists. To me this says that predictive analytical approaches are alive and kicking. This is good news and shows that data scientists are trying to enable businesses to make better use of their data. Clearly, a key step that holds data scientist back today is data prep. While it is critical to leave ample time for data prep, it is also essential to get quality data to ensure models are working appropriately. At the same time, data prep needs to support inputting data that is not available within the original data set.

Related links

Solution Brief: Data Prep

Big Data: The Emperor may have their Clothes on but…

Should We Still be Calling it Big Data?

Big Data Integration

Big Data Decision Automation

Data Mastering

Data Lake + Analysis

Related Blogs

Analytics Stories: A Banking Case Study

Analytics Stories: A Pharmaceutical Case Study

Analytics Stories: An Educational Case Study

Analytics Stories: A Financial Services Case Study

Analytics Stories: A Healthcare Case Study

Major Oil Company Uses Analytics to Gain Business Advantage

Is it the CDO or CAO or Someone Else?

Should We Still be Calling it Big Data?

What Should Come First: Business Processes or Analytics?

Should Analytics Be Focused on Small Questions Versus Big Questions?

Who Owns Enterprise Analytics and Data?

Competing on Analytics

Is Big Data Destined To Become Small And Vertical?

Big Data Why?

What is big data and why should your business care?

Author Twitter: @MylesSuer

 

 

 

Share
Posted in 5 Sales Plays, Big Data, CIO, Complex Event Processing | Tagged | Leave a comment

Big Data is Nice to Have, But Big Culture is What Delivers Success

big_data

Big Data is Nice to Have, But Big Culture is What Delivers Success

Despite spending more than $30 Billion in annual spending on Big Data, successful big data implementations elude most organizations. That’s the sobering assessment of a recent study of 226 senior executives from Capgemini, which found that only 13 percent feel they have truly have made any headway with their big data efforts.

The reasons for Big Data’s lackluster performance include the following:

  • Data is in silos or legacy systems, scattered across the enterprise
  • No convincing business case
  • Ineffective alignment of Big Data and analytics teams across the organization
  • Most data locked up in petrified, difficult to access legacy systems
  • Lack of Big Data and analytics skills

Actually, there is nothing new about any of these issues – in fact, the perceived issues with Big Data initiatives so far map closely with the failed expect many other technology-driven initiatives. First, there’s the hype that tends to get way ahead of any actual well-functioning case studies. Second, there’s the notion that managers can simply take a solution of impressive magnitude and drop it on top of their organizations, expecting overnight delivery of profits and enhanced competitiveness.

Technology, and Big Data itself, is but a tool that supports the vision, well-designed plans and hard work of forward-looking organizations. Those managers seeking transformative effects need to look deep inside their organizations, at how deeply innovation is allowed to flourish, and in turn, how their employees are allowed to flourish. Think about it: if line employees suddenly have access to alternative ways of doing things, would they be allowed to run with it? If someone discovers through Big Data that customers are using a product differently than intended, do they have the latitude to promote that new use? Or do they have to go through chains of approval?

Big Data may be what everybody is after, but Big Culture is the ultimate key to success.

For its part, Capgemini provides some high-level recommendations for better baking in transformative values as part of Big Data initiatives, based on their observations of best-in-class enterprises:

The vision thing: “It all starts with vision,” says Capgemini’s Ron Tolido. “If the company executive leadership does not actively, demonstrably embrace the power of technology and data as the driver of change and future performance, nothing digitally convincing will happen. We have not even found one single exception to this rule. The CIO may live and breathe Big Data and there may even be a separate Chief Data Officer appointed – expect more of these soon – if they fail to commit their board of executives to data as the engine of success, there will be a dark void beyond the proof of concept.”

Establish a well-defined organizational structure: “Big Data initiatives are rarely, if ever, division-centric,” the Capgemini report states. “They often cut across various departments in an organization. Organizations that have clear organizational structures for managing rollout can minimize the problems of having to engage multiple stakeholders.”

Adopt a systematic implementation approach:  Surprisingly, even the largest and most sophisticated organizations that do everything on process don’t necessarily approach Big Data this way, the report states. “Intuitively, it would seem that a systematic and structured approach should be the way to go in large-scale implementations. However, our survey shows that this philosophy and approach are rare. Seventy-four percent of organizations did not have well-defined criteria to identify, qualify and select Big Data use-cases. Sixty-seven percent of companies did not have clearly defined KPIs to assess initiatives. The lack of a systematic approach affects success rates.”

Adopt a “venture capitalist” approach to securing buy-in and funding: “The returns from investments in emerging digital technologies such as Big Data are often highly speculative, given the lack of historical benchmarks,” the Capgemini report points out. “Consequently, in many organizations, Big Data initiatives get stuck due to the lack of a clear and attributable business case.” To address this challenge, the report urges that Big Data leaders manage investments “by using a similar approach to venture capitalists. This involves making multiple small investments in a variety of proofs of concept, allowing rapid iteration, and then identifying PoCs that have potential and discarding those that do not.”

Leverage multiple channels to secure skills and capabilities: “The Big Data talent gap is something that organizations are increasingly coming face-to-face with. Closing this gap is a larger societal challenge. However, smart organizations realize that they need to adopt a multi-pronged strategy. They not only invest more on hiring and training, but also explore unconventional channels to source talent. Capgemini advises reaching out to partner organizations for the skills needed to develop Big Data initiatives. These can be employee exchanges, or “setting up innovation labs in high-tech hubs such as Silicon Valley.” Startups may also be another source of Big Data talent.

Share
Posted in B2B, B2B Data Exchange, Big Data, Business Impact / Benefits | Tagged , , | Leave a comment

Information = Data + R

Data + R

Information = Data + R

Over and over, when talking with people who are starting to learn Data Science, there’s a frustration that comes up: “I don’t know which programming language to start with.”

Moreover, it’s not just programming languages; it’s also software systems like Tableau, SPSS, etc. There is an ever-widening range of tools and programming languages and it’s difficult to know which one to select.

I get it. When I started focusing heavily on data science a few years ago, I reviewed all of the popular programming languages at the time: Python, R, SAS, D3, not to mention a few that in hindsight, really aren’t that great for analytics like Perl, Bash, and Java. I once read a suggestion to use arcane tools like UNIX’s AWK and SED.

There are so many suggestions, so much material, so many options; it becomes difficult to know what to learn first. There’s a mountain of content, and it’s difficult to know where to find the “gold nuggets”; the things to learn that will bring you the high return on time investment.

That’s the crux of the problem. The fact is – time is limited. Learning a new programming language is a large investment in your time, so you need to be strategic about which one you select. To be clear, some languages will yield a very high return on your investment. Other languages are purely auxiliary tools that you might use only a few times per year.

Let me make this easy for you: learn R first. Here’s why:

R is becoming the “lingua franca” of data science

R is becoming the lingua franca for data science. That’s not to say that it’s the only language, or that it’s the best tool for every job. It is, however, the most widely used and it is rising in popularity.

As I’ve noted before, O’Reilly Media conducted a survey in 2014 to understand the tools that data scientists are currently using. They found that R is the most popular programming language (if you exclude SQL as a “proper” programing language).

Looking more broadly, there are other rankings that look at programming language popularity in general. For example, Redmonk measures programming language popularity by examining discussion (on Stack Overflow) and usage (on GitHub). In their latest rankings, R placed 13th, the highest of any statistical programming language. Redmonk also noted that R has been rising in popularity over time.

A similar ranking by TIOBE, which ranks programming languages by the number of search engine searches, indicates a strong year over year rise for R.

Keep in mind that the Redmonk and TIOBE rankings are for all programming languages. When you look at these, R is now ranking among the most popular and most commonly used over all.

Data wrangling

It’s often said that 80% of the work in data science is data manipulation. More often than not, you’ll need to spend significant amounts of your time “wrangling” your data; putting it into the shape you want. R has some of the best data management tools you’ll find.

The dplyr package in R makes data manipulation easy. It is the tool I wish I had years ago. When you “chain” the basic dplyr together, you can dramatically simplify your data manipulation workflow.

Data visualization

ggplot2 is one of the best data visualization tools around, as of 2015. What’s great about ggplot2 is that as you learn the syntax, you also learn how to think about data visualization.

I’ve said numerous times, that there is a deep structure to all statistical visualizations. There is a highly structured framework for thinking about and creating all data visualizations. ggplot2 is based on that framework. By learning ggplot2, you will learn how to think about visualizing data.

Moreover, when you combine ggplot2 and dplyr together (using the chaining methodology), finding insight in your data becomes almost effortless.

Machine learning

Finally, there’s machine learning. While I think most beginning data science students should wait to learn machine learning (it is much more important to learn data exploration first), machine learning is an important skill. When data exploration stops yielding insight, you need stronger tools.

When you’re ready to start using (and learning) machine learning, R has some of the best tools and resources.

One of the best, most referenced introductory texts on machine learning, An Introduction to Statistical Learning, teaches machine learning using the R programming language. Additionally, the Stanford Statistical Learning course uses this textbook, and teaches machine learning in R.

Summary: Learn R, and focus your efforts

Once you start to learn R, don’t get “shiny new object” syndrome.

You’re likely to see demonstrations of new techniques and tools. Just look at some of the dazzling data visualizations that people are creating.

Seeing other people create great work (and finding out that they’re using a different tool) might lead you to try something else. Trust me on this: you need to focus. Don’t get “shiny new object” syndrome. You need to be able to devote a few months (or longer) to really diving into one tool.

And as I noted above, you really want to build up your competence in skills across the data science workflow. You need to have solid skills at least in data visualization and data manipulation. You need to be able to do some serious data exploration in R before you start moving on.

Spending 100 hours on R will yield vastly better returns than spending 10 hours on 10 different tools. In the end, your time ROI will be higher by concentrating your efforts. Don’t get distracted by the “latest, sexy new thing.”

Twitter @bigdatabeat

Share
Posted in Big Data, Data Transformation, General, Hadoop, Professional Services | Tagged , , , , | Leave a comment

Building an Impactful Data Governance – One Step at a Time

Let’s face it, building a Data Governance program is no overnight task.  As one CDO puts it:  ”data governance is a marathon, not a sprint”.  Why? Because data governance is a complex business function that encompasses technology, people and process, all of which have to work together effectively to ensure the success of the initiative.  Because of the scope of the program, Data Governance often calls for participants from different business units within an organization, and it can be disruptive at first.

Why bother then?  Given that data governance is complex, disruptive, and could potentially introduce additional cost to a company?  Well, the drivers for data governance can vary for different organizations.  Let’s take a close look at some of the motivations behind data governance program.

For companies in heavily regulated industries, establishing a formal data governance program is a mandate.  When a company is not compliant, consequences can be severe. Penalties could include hefty fines, brand damage, loss in revenue, and even potential jail time for the person who is held accountable for being noncompliance. In order to meet the on-going regulatory requirements, adhere to data security policies and standards, companies need to rely on clean, connected and trusted data to enable transparency, auditability in their reporting to meet mandatory requirements and answer critical questions from auditors.  Without a dedicated data governance program in place, the compliance initiative could become an on-going nightmare for companies in the regulated industry.

A data governance program can also be established to support customer centricity initiative. To make effective cross-sells and ups-sells to your customers and grow your business,  you need clear visibility into customer purchasing behaviors across multiple shopping channels and touch points. Customer’s shopping behaviors and their attributes are captured by the data, therefore, to gain thorough understanding of your customers and boost your sales, a holistic Data Governance program is essential.

Other reasons for companies to start a data governance program include improving efficiency and reducing operational cost, supporting better analytics and driving more innovations. As long as it’s a business critical area and data is at the core of the process, and the business case is loud and sound, then there is a compelling reason for launching a data governance program.

Now that we have identified the drivers for data governance, how do we start?  This rather loaded question really gets into the details of the implementation. A few critical elements come to consideration including: identifying and establishing various task forces such as steering committee, data governance team and business sponsors; identifying roles and responsibilities for the stakeholders involved in the program; defining metrics for tracking the results.  And soon you will find that on top of everything, communications, communications and more communications is probably the most important tactic of all for driving the initial success of the program.

A rule of thumb?  Start small, take one-step at a time and focus on producing something tangible.

Sounds easy, right? Think this is easy?!Well, let’s hear what the real-world practitioners have to say. Join us at this Informatica webinar to hear Michael Wodzinski, Director of Information Architecture, Lisa Bemis, Director of Master Data, Fabian Torres, Director of Project Management from Houghton Mifflin Harcourt, global leader in publishing, as well as David Lyle, VP of product strategy from Informatica to discuss how to implement  a successful data governance practice that brings business impact to an enterprise organization.

If you are currently kicking the tires on setting up data governance practice in your organization,  I’d like to invite you to visit a member-only website dedicated to Data Governance:  http://governyourdata.com/. This site currently has over 1,000 members and is designed to foster open communications on everything data governance. There you will find conversations on best practices, methodologies, frame works, tools and metrics.  I would also encourage you to take a data governance maturity assessment to see where you currently stand on the data governance maturity curve, and compare the result against industry benchmark.  More than 200 members have taken the assessment to gain better understanding of their current data governance program,  so why not give it a shot?

Governyourdata.com

Governyourdata.com

Data Governance is a journey, likely a never-ending one.  We wish you best of the luck on this effort and a joyful ride! We love to hear your stories.

Share
Posted in Big Data, Data Governance, Data Integration, Data Quality, Enterprise Data Management, Master Data Management | Tagged , , , , , , , , , , , | 1 Comment

Informatica Doubled Big Data Business in 2014 As Hadoop Crossed the Chasm

Big Data

Informatica Doubled Big Data Business in 2014 As Hadoop Crossed the Chasm

2014 was a pivotal turning point for Informatica as our investments in Hadoop and efforts to innovate in big data gathered momentum and became a core part of Informatica’s business. Our Hadoop related big data revenue growth was in the ballpark of leading Hadoop startups – more than doubling over 2013.

In 2014, Informatica reached about 100 enterprise customers of our big data products with an increasing number going into production with Informatica together with Hadoop and other big data technologies.  Informatica’s big data Hadoop customers include companies in financial services, insurance, telcommunications, technology, energy, life sciences, healthcare and business services.  These innovative companies are leveraging Informatica to accelerate their time to production and drive greater value from their big data investments.

These customers are in-production or implementing a wide range of use cases leveraging Informatica’s great data pipeline capabilities to better put the scale, efficiency and flexibility of Hadoop to work.  Many Hadoop customers start by optimizing their data warehouse environments by moving data storage, profiling, integration and cleansing to Hadoop in order to free up capacity in their traditional analytics data warehousing systems. Customers that are further along in their big data journeys have expanded to use Informatica on Hadoop for exploratory analytics of new data types, 360 degree customer analytics, fraud detection, predictive maintenance, and analysis of massive amounts of Internet of Things machine data for optimization of energy exploration, manufacturing processes, network data, security and other large scale systems initiatives.

2014 was not just a year of market momentum for Informatica, but also one of new product development innovations.  We shipped enhanced functionality for entity matching and relationship building at Hadoop scale (a key part of Master Data Management), end-to-end data lineage through Hadoop, as well as high performance real-time streaming of data into Hadoop. We also launched connectors to NoSQL and analytics databases including Datastax Cassandra, MongoDB and Amazon Redshift. Informatica advanced our capabilities to curate great data for self-serve analytics with a connector to output Tableau’s data format and launched our self-service data preparation solution, Informatica Rev.

Customers can now quickly try out Informatica on Hadoop by downloading the free trials for the Big Data Edition and Vibe Data Stream that we launched in 2014.  Now that Informatica supports all five of the leading Hadoop distributions, customers can build their data pipelines on Informatica with confidence that no matter how the underlying Hadoop technologies evolve, their Informatica mappings will run.  Informatica provides highly scalable data processing engines that run natively in Hadoop and leverage the best of open source innovations such as YARN, MapReduce, and more.   Abstracting data pipeline mappings from the underlying Hadoop technologies combined with visual tools enabling team collaboration empowers large organizations to put Hadoop into production with confidence.

As we look ahead into 2015, we have ambitious plans to continue to expand and evolve our product capabilities with enhanced productivity to help customers rapidly get more value from their data in Hadoop. Stay tuned for announcements throughout the year.

Try some of Informatica’s products for Hadoop on the Informatica Marketplace here.

Share
Posted in B2B Data Exchange, Big Data, Data Integration, Data Services, Hadoop | Tagged , , , , , , | Leave a comment

Great Data Increases Value and De-Risks the Drone

great data

Great Data Increases Value and De-Risks the Drone

At long last, the anxiously awaited rules from the FAA have brought some clarity to the world of commercial drone use. Up until now, commercial drone use has been prohibited. The new rules, of course, won’t sit well with Amazon who would like to drop merchandise on your porch at all hours. But the rules do work really well for insurers who would like to use drones to service their policyholders. So now drones, and soon to be fleets of unmanned cars will be driving the roadways in any numbers of capacities. It seems to me to be an ambulance chaser’s dream come true. I mean who wouldn’t want some seven or eight figure payday from Google for getting rear-ended?

What about “Great Data”? What does that mean in the context of unmanned vehicles, both aerial and terrestrial? Let’s talk about two aspects. First, the business benefits of great data using unmanned drones.

An insurance adjuster or catastrophe responder can leverage an aerial drone to survey large areas from a central location. They will pin point the locations needing attention for further investigation. This is a common scenario that many insurers talk about when the topic of aerial drone use comes up. Second to that is the ability to survey damage in hard to reach locations like roofs or difficult terrain (like farmland). But this is where great data comes into play. Surveying, service and use of unmanned vehicles demands that your data can answer some of the following questions for your staff operating in this new world:

Where am I?

Quality data and geocoded locations as part of that data is critical. In order to locate key risk locations, your data must be able to coordinate with the lat/long of the location recorded by your unmanned vehicles and the location of your operator. Ensure clean data through robust data quality practices.

Where are my policyholders?

Knowing the location of your policyholders not only relies on good data quality, but on knowing who they are and what risks you are there to help service. This requires a total customer relationship solution where you have a full view of not only locations, but risks, coverages and entities making up each policyholder.

What am I looking at?

Archived, current and work in process imaging is a key place where a Big Data environment can assist over time. By comparing saved images with new and processing claims, claims fraud and additional opportunities for service can be detected quickly by the drone operator.

Now that we’ve answered the business value questions and leveraged this new technology to better service policyholders and speed claims, let’s turn to how great data can be used to protect the insurer and drone operator from liability claims. This is important. The FAA has stopped short of requiring commercial drone operators to carry special liability insurance, leaving that instead to the drone operators to orchestrate with their insurer. And now we’re back to great data. As everyone knows, accidents happen. Technology, especially robotic mobile technology is not infallible. Something will crash somewhere, hopefully not causing injury or death, but sadly that too will likely happen. And there is nothing that will keep the ambulance chasers at bay more than robust great data. Any insurer offering liability cover for a drone operator should require that some of the following questions be answered by the commercial enterprise. And the interesting fact is that this information should be readily available if the business questions above have been answered.

  • Where was my drone?
  • What was it doing?
  • Was it functioning properly?

Properly using the same data management technology as in the previous questions will provide valuable data to be used as evidence in the case of liability against a drone operator. Insurers would be wise to ask these questions of their liability policyholders who are using unmanned technology as a way to gauge liability exposure in this brave new world. The key to the assessment of risk being robust data management and great data feeding the insurer’s unmanned policyholder service workers.

Time will tell all the great and imaginative things that will take place with this new technology. One thing is for certain. Great data management is required in all aspects from amazing customer service to risk mitigation in operations.  Happy flying to everyone!!

Share
Posted in Big Data, Data Quality, Financial Services | Tagged , , , , | Leave a comment

The Sexiest Job of the 21st Century

Sexiest Job

The Sexiest Job of the 21st Century

I’ve spent most of my career working with new technology, most recently helping companies make sense of mountains of incoming data. This means, as I like to tell people, that I have the sexiest job in the 21st century.

Harvard Business Review put the data scientist into the national spotlight in their publication Data Scientist: The Sexiest Job of the 21st Century. Job trends data from Indeed.com confirms the rise in popularity for the position, showing that the number of job postings for data scientist positions increased by 15,000%.

In the meantime, the role of data scientist has changed dramatically. Data used to reside on the fringes of the operation. It was usually important but seldom vital – a dreary task reserved for the geekiest of the geeks. It supported every function but never seemed to lead them. Even the executives who respected it never quite absorbed it.

For every Big Data problem, the solution often rests on the shoulders of a data scientist. The role of the data scientist is similar in responsibility to the Wall Street “quants” of the 80s and 90s – now, these data experienced are tasked with the management of databases previously thought too hard to handle, and too unstructured to derive any value.

So, is it the sexiest job of the 21st Century?

Think of a data scientist more like the business analyst-plus, part mathematician, part business strategist, these statistical savants are able to apply their background in mathematics to help companies tame their data dragons. But these individuals aren’t just math geeks, per se.

A data scientist is somebody who is inquisitive, who can stare at data and spot trends. It’s almost like a renaissance individual who really wants to learn and bring change to an organization.

If this sounds like you, the good news is demand for data scientists is far outstripping supply. Nonetheless, with the rising popularity of the data scientist – not to mention the companies that are hiring for these positions – you have to be at the top of your field to get the jobs.

Companies look to build teams around data scientists that ask the most questions about:

  • How the business works
  • How it collects its data
  • How it intends to use this data
  • What it hopes to achieve from these analyses

These questions were important because data scientists will often unearth information that can “reshape an entire company.” Obtaining a better understanding of the business’ underpinnings not only directs the data scientist’s research, but helps them present the findings and communicate with the less-analytical executives within the organization.

While it’s important to understand your own business, learning about the successes of other corporations will help a data scientist in their current job–and the next.

Twitter @bigdatabeat

Share
Posted in Architects, Big Data, Business/IT Collaboration, CIO, Data Governance, General, Governance, Risk and Compliance, Real-Time | Tagged , , | Leave a comment

Informatica and Pivotal Delivering Great Data to Customers

Informatica and Pivotal Delivering Great Data to Customers

Delivering Great Data to Customers

As we head into Strata + Hadoop World San Jose, Pivotal has made some interesting announcements that are sure to be the talk of the show. Pivotal’s move to open-source some of their advanced products (and to form a new organization to foster Hadoop community cooperation) are signs of the dynamism and momentum of the Big Data market.

Informatica applauds these initiatives by Pivotal and we hope that they will contribute to the accelerating maturity of Hadoop and its expansion beyond early adopters into mainstream industry adoption. By contributing HAWQ, GemFire and the Greenplum Database to the open source community, Pivotal creates further open options in the evolving Hadoop data infrastructure technology. We expect this to be well received by the open source community.

As Informatica has long served as the industry’s neutral data connector for more than 5,500 customers and have developed a rich set of capabilities for Hadoop, we are also excited to see efforts to try to reduce fragmentation in the Hadoop community.

Even before the new company Pivotal was formed, Informatica had a long history working with the Greenplum team to ensure that joint customers could confidently use Informatica tools to include the Greenplum Database in their enterprise data pipelines. Informatica has mature and high-performance native connectivity to load data in and out of Greenplum reliably using Informatica’s codeless, visual data pipelining tools. In 2014, Informatica expanded out Hadoop support to include Pivotal HD Hadoop and we have joint customers using Informatica to do data profiling, transformation, parsing and cleansing using Informatica Big Data Edition running on Pivotal HD Hadoop.

We expect these innovative developments driven by Pivotal in the Big Data technology landscape to help to move the industry forward and contribute to Pivotal’s market progress. We look forward to continuing to support Pivotal technology and to an ever increasing number of successful joint customers. Please reach out to us if you have any questions about how Informatica and Pivotal can help your organization to put Big Data into production. We want to ensure that we can help you answer the question … Are you Big Data Ready?

Share
Posted in Big Data, Data Governance, Hadoop | Tagged , , , , , | Leave a comment