Category Archives: Big Data
A lot of my time is spent discussing enterprise and end user value of software solutions. Increasingly over the last few years the solution focus has moved from being first about specific application and business processes to being data centric. People start with thinking and asking about what data that is collected, displayed, manipulated and automated instead of what is the task (e.g. we need to better understand how our customers make buying decisions instead of we need to streamline our account managers daily tasks). I have been working on a mental model for how to think about these different types of solutions and one that would give me a better framework when discussing product, technical and marketing topics with clients or friends in the industry.
I came up with the following framework as a 2×2 matrix that uses two main axis to define the perceived value of data centric solutions. These are the Volume & Complexity of Data Integration and the Completeness & Flexibility of Data Analytics.
The reason for these definitions is that one very real change is that most clients that I work with are constantly dealing with distributed applications and business processes which means having to figure out how to bring that data together either in a new solution or in a analytics solution that can work across the various data sets. There is no single right answer to these issues but there are very real patterns of how different companies and solutions approach the underlying issue of growing distributed data inside and outside the control of the company.
1. Personal Productivity. These are solutions that collect and present data mostly for individual use, team data sharing and organization. They tend to be single task oriented and provide data reporting functions.
2. Business Productivity. These solutions usually span multiple data sources and are focused on either decision support, communication or collaboration.
3. Business Criticality. Theses solutions provide new value or capabilities to an organization by adding advanced data analytics that provided automated response or secondary views across distributed data sources.
4. Life Criticality. These solutions are a special subset which are aimed at either individual, group or social impact solutions. Traditionally these have been very proprietary and closed systems. The main trend in data-centric solutions is coming from more government and business data being exposed which can be integrated into new solutions that we just never could even do previously let alone think up. I do not even have a good example of a real one yet, but I see it as the higher level solution that evolves as at the juncture of real-time data meets analytics and distributed data sets.
Some examples of current solutions as I would map them on the perceived value of data centric solutions framework. Some of these are well known and others you probably have never heard. Many of these new solutions were not easy to create without technology that provides easier access to data from distributed resources or compute power for supporting decision support.
What I really like about this value framework is that it allows us to get beyond all the buzzwords of IoT, BigData, etc and focus on the real needs and solutions that are needed and that cross over these technical or singular topics but on their own are not actual high value business solutions. Feedback welcome.
For those hoping to push through a hard-hitting analytics effort that will serve as a beacon of light within an otherwise calcified organization, there’s probably a lot of work cut out for you. Evolving into an organization that fully grasps the power and opportunities of data analytics requires cultural change, and this is a challenge organizations have only begin to grasp.
“Sitting down with pizza and coffee could get you around can get around most of the technical challenges,” explained Sam Ransbotham, Ph.D, associate professor Boston College, at a recent panel webcast hosted by MIT Sloan Management Review, “but the cultural problems are much larger.”
That’s one of the key takeaways from a the panel, in which Ransbotham was joined by Tuck Rickards, head of digital transformation practice at Russell Reynolds Associates, a digital recruiting firm, and Denis Arnaud, senior data scientist Amadeus Travel Intelligence. The panel, which examined the impact of corporate culture on data analytics, was led by Michael Fitzgerald, contributing editor at MIT Sloan Management Review.
The path to becoming an analytics-driven company is a journey that requires transformation across most or all departments, the panelists agreed. “It’s fundamentally different to be a data-driven decision company than kind of a gut-feel decision-making company,” said Rickards. “Acquiring this capability to do things differently usually requires a massive culture shift.”
That’s because the cultural aspects of the organization – “the values, the behaviors, the decision making norms and the outcomes go hand in hand with data analytics,” said Ransbotham. “It doesn’t do any good to have a whole bunch of data processes if your company doesn’t have the culture to act on them and do something with them.” Rickards adds that bringing this all together requires an agile, open source mindset, with frequent, open communication across the organization.
So how does one go about building and promoting a culture that is conducive to getting the maximum benefit from data analytics? The most important piece is being about people who ate aware and skilled in analytics – both from within the enterprise and from outside, the panelists urged. Ransbotham points out that it may seem daunting, but it’s not. “This is not some gee-whizz thing,” he said. “We have to get rid of this mindset that these things are impossible. Everybody who has figured it out has figured it out somehow. We’re a lot more able to pick up on these things that we think — the technology is getting easier, it doesn’t require quite as much as it used to.”
The key to evolving corporate culture to becoming more analytics-driven is to identify or recruit enlightened and skilled individuals who can provide the vision and build a collaborative environment. “The most challenging part is looking for someone who can see the business more broadly, and can interface with the various business functions –ideally, someone who can manage change and transformation throughout the organization,” Rickards said.
Arnaud described how his organization – an online travel service — went about building an espirit de corps between data analytics staff and business staff to ensure the success of their company’s analytics efforts. “Every month all the teams would do a hands-on workshop, together in some place in Europe [Amadeus is headquartered in Madrid, Spain].” For example, a workshop may focus on a market analysis for a specific customer, and the participants would explore the entire end-to-end process for working with the customer, “from the data collection all the way through to data acquisition through data crunching and so on. The one knowing the data analysis techniques would explain them, and the one knowing the business would explain that, and so on.” As a result of these monthly workshops, business and analytics teams members have found it “much easier to collaborate,” he added.
Web-oriented companies such as Amadeus – or Amazon and eBay for that matter — may be paving the way with analytics-driven operations, but companies in most other industries are not at this stage yet, both Rickards and Ransbotham point out. The more advanced web companies have built “an end-to-end supply chain, wrapped around customer interaction,” said Rickards. “If you think of most traditional businesses, financial services or automotive or healthcare are a million miles away from that. It starts with having analytic capabilities, but it’s a real journey to take that capability across the company.”
The analytics-driven business of the near future – regardless of industry – will likely to be staffed with roles not seen as of yet today. “If you are looking to re-architect the business, you may be imagining roles that you don’t have in the company today,” said Rickards. Along with the need for chief analytics officers, data scientists, and data analysts, there will be many new roles created. “If you are on the analytics side of this, you can be in an analytics group or a marketing group, with more of a CRM or customer insights title. Yu can be in a planning or business functions. In a similar way on the technology side, there are people very focused on architecture and security.”
Ultimately, the demand will be for leaders and professionals who understand both the business and technology sides of the opportunity, Rickards continued. Ultimately, he added, “you can have good people building a platform, and you can have good data scientists. But you better have someone on the top of that organization knowing the business purpose.’
I live in a small town in Maine. Between my town and the surrounding three towns, there are seven Main Streets and three Annis Roads or Lanes (and don’t get me started on the number of Moose Trails). If your insurance company wants to market to or communicate with someone in my town or one of the surrounding towns, how can you ensure that the address that you are sending material to is correct? What is the cost if material is sent to an incorrect or outdated address? What is the cost to your insurance company if a provider sends the bill out to the wrong ?
How much is poor address quality costing your business? It doesn’t just impact marketing where inaccurate address data translates into missed opportunity – it also means significant waste in materials, labor, time and postage . Bills may be delivered late or returned with sender unknown, meaning additional handling times, possible repackaging, additional postage costs (Address Correction Penalties) and the risk of customer service issues. When mail or packages don’t arrive, pressure on your customer support team can increase and your company’s reputation can be negatively impacted. Bills and payments may arrive late or not at all directly impacting your cash flow. The cost of bad address data causes inefficiencies and raises costs across your entire organization.
The best method for handling address correction is through a validation and correction process:
When trying to standardize member or provider information one of the first places to look is address data. If you can determine that John Q Smith that lives at 134 Main St in Northport, Maine 04843 is the same John Q Smith that lives at 134 Maine Street in Lincolnville, Maine 04849, you have provided a link between two members that are probably considered distinct in your systems. Once you can validate that there is no 134 Main St in Northport according to the postal service, and then can validate that 04849 is a valid zip code for Lincolnville – you can then standardize your address format to something along the lines of: 134 MAIN ST LINCOLNVILLE,ME 04849. Now you have a consistent layout for all of your addresses that follows postal service standards. Each member now has a consistent address which is going to make the next step of creating a golden record for each member that much simpler.
Think about your current method of managing addresses. Likely, there are several different systems that capture addresses with different standards for what data is allowed into each field – and quite possibly these independent applications are not checking or validating against country postal standards. By improving the quality of address data, you are one step closer to creating high quality data that can provide the up-to-the minute accurate reporting your organization needs to succeed.
I found a truly cringe-worthy article today that shows what popular websites looked like more than a decade ago and what they look like today. Looking back to what was cutting-edge in 1996, or even 2006, is as bad as fingernails on a chalkboard compared to the modern homepages of popular sites today.
These websites are still well-used today, staying with the times and leading the way we design digital experiences. The key was change over many years of research and understanding of user experience. These sites stay modern, adapting to different web-enabled devices and experiences that the end user will encounter. Common among them are beautiful imagery, clear calls to action, and a sophisticated understanding of what people want on a homepage.
Can you imagine if any of those sites had stayed the same and never changed? We would not be using them today if that were the case. Their popularity would wane. Change is never easy, but it is usually necessary to stay relevant.
Web designers in 1996 could not imagine what the internet would be like in 2015, although they would probably agree there was a lot of potential. A modern equivalent is the implications of big data throughout the enterprise.
Data-driven marketers today are wondering how they can gain insight from big data. The answer? The ability to change is the connection between big data and insight. Data-driven marketers today know that their roles are changing: 68% of marketers think that marketing has seen more changes in the last two years than it has in the past 50 years, according to a recent survey. The changes are due to a renewed focus on customer experience within their jobs, and the need to use big data to improve that experience.
Big data should drive insights that change businesses, but is the real reason marketers aren’t sure about how to use big data tied to the change that it requires? Leading change in an organization is never easy, but it is definitely necessary.
What insights do you want from big data, and what value can you derive from them? If your reason for using big data is customer behavior insights, how will knowing how a customer behaves influence any changes in your approach?
The National Retail Federation recently reported that retailers say these are the three top reasons for using big data in a survey:
- Analyzing customer behavior (56 percent)
- Bringing together different data sources (49 percent)
- Improving personalization (48 percent)
What are your reasons for using big data?
Data-driven marketers can drown in too much information if they look at massive datasets without a question in mind that they want to answer. The question being asked often implies that a business must change to stay modern and relevant to its customers. Could concern over a need for great change be the roadblock to data-driven marketers who could be using data for valuable insights?
Big data has gotten a lot of buzz in the last few years. Data-driven marketers can move the big data concept from fuzzy, unrealized potential to a major part of how their business operates successfully.
Learn more in this white paper for marketers, The Secret to a Successful Customer Journey.
I recently got to meet with a very enlightened insurance company which was actively turning their SWOTT analysis (with the second T being trends) into concrete action. They shared with me that they view their go forward “right to win” being determined by the quality of customer experience they deliver to customers through their traditional channels and increasingly through “digital channels”. One marketing leader joked early on that “it’s no longer about the money; it is about the experience”. The marketing and business leaders that I met with made it extremely clear that they have a sense of urgency to respond to what it saw as significant market changes on the horizon. What this company wanted to achieve was a single view of customer across each of its distribution channel as well as their agent population. Typical of many businesses today, they had determined that they needed an automated, holistic view into things like its customer history. Smartly, this business wanted to put together its existing customer data with its customer leads.
Using Data to Accelerate the Percentage Customers that are Cross Sold
Taking this step was seen as allowing them to understand when an existing customer is also a lead for another product. With this knowledge, they wanted to provide them with special offers to accelerate their conversion from lead to being a customer with more than one product. What they wanted to do here reminded me of the capabilities of 58.com, eBay, and other Internet pure plays. The reason for doing this well was described recently by Gartner. Gartner suggests that increasing business success is determined by what they call “business moments”. Without a first rate experience that builds upon what this insurance company already knows about its customers, this insurance company worries it could be increasing at risk by Internet pure plays. As important, like many businesses, the degree of cross sell is for many businesses a major determinant of whether a customer is profitable or not.
Getting Customer Data Right is Key to Developing a Winning Digital Experience
To drive a first rate digital experience, this insurance company wanted to apply advanced analytics to a single view of customer and prospect data. This would allow them to do things like conduct nearest neighbor predictive analysis and modeling. In this form of analysis, “the goal is to predict whether a new customer will respond to an offer based on how other similar customers have responded” (Data Science for Business, Foster Provost, O’Reilly, 2013, page 147).
What has limiting this business like so many others is that their customer data is scattered across many enterprise systems. For just for one division, they have more than one Salesforce instance. Yet this company’s marketing team knew to keep its customers, it needed to be able to service them omnichannel and establish a single unified customer experience. To make this happen, they needed to for the first to share holistic customer information across their ecosystems. At the same time, they knew that they would needed to protect their customer’s privacy—i.e. only certain people would be able to see certain information. They wanted by role that the ability to selective mask data and protect their customer in particular consumers by only allowing certain users in defense parlance, with a need to know, to see a subset of the holistic set of information collected. When asked about the need for a single view of customer, the digital marketing folks openly shared that they perceived the potential for external digital market entrants—ala Porter’s five forces of competition. This firm saw them either as taking market share from them or effectively disintermediating them over time them from their customers as more and more customers move their insurance purchasing of Insurance to the Web. Given the risk, their competitive advantage needed to move to knowing better their customer and being able to respond better to them on the web. This clearly included new customers that are trying to win in the language of Theodore Levitt.
Competing on Customer Experience
In sum, this insurance company smartly felt that they needed to compete on customer experience to pull out a new phrase for me and this required superior knowledge of existing and new customers. This means they needed as complete and correct view of customers as possible including addresses, connection preferences, and increasingly social media responses. This means competitively responding directly to those that have honed their skills in web design, social presence, and advanced analytics. To do this, they will create predictive capabilities that will make use of their superior customer data. Clearly, without this prescience of thinking, this moment will not be like the strategic collision of Starbucks and Fast Food Vendors where the desire to grow forced competition between the existing player and new entrants wanting to claim a portion of the existing market player’s business.
Blogs and Articles
Last week I had the opportunity to attend the Data Mania industry event hosted by Informatica. The afternoon event was a nice mix of industry panels with technical and business speakers from companies that included Amazon, Birst, AppDynamics, SalesForce, Marketo, Tableau, Adobe, Informatica, Birst, Dun & Bradstreet and several others.
A main theme of the event that came through with so many small, medium and large SaaS vendors was that everyone is increasingly dependent on being able to integrate data from other solutions and platforms. The second part of this was that customers increasingly expect the data integration requirements to work under the covers so they can focus on the higher level business solutions.
I really thought the four companies presented by Informatica as the winners of their Connect-a-Thon contest were the highlight. Each of these solutions was built by a company and highlighted some great aspects of data integration.
Databricks provides a cloud platform for big data processing. The solution leverages Apache Spark, which is an open source engine for big data processing that has seen a lot of adoption. Spark is the engine for the Databricks Cloud which then adds several enterprise features for visualization, large scale Spark cluster management, workflow and integration with third party applications. Having a big data solution means bringing data in from a lot of SaaS and on premise sources so Databricks built a connector to Informatica Cloud to make it easier to load data into the Databricks Cloud. Again, it’s a great example the ecosystem where higher level solutions can leverage 3rd party.
Thoughspot provides a search based BI solution. The general idea is that a search based interface provides a tool that a much broader group of users can use with little training to access to the power of enterprise business intelligence tools. It reminds me of some other solutions that fall into the enterprise 2.0 area and do everything from expert location to finding structured and unstructured data more easily. They wrote a nice blog post explaining why they built the ThoughtSpot Connector for Informatica Cloud. The main reason they are using Informatica to handle the data integration so they can focus on their own solution, which is the end user facing BI tools. It’s the example of SaaS providers choosing to either roll their own data integration or leveraging other providers as part of their solution.
BigML provides some very interesting machine learning solutions. The simple summary would be they are trying to create beautiful visualization and predicative modeling tools. The solution greatly simplifies the process of model iteration and visualizing models. Their gallery of models has several very good examples. Again, in this case BigML built a connector to Informatica Cloud for the SaaS and on premise integration and also in conjunction with the existing BigML REST API. BigML wrote a great blog post on their connector that goes into more details.
FollowAnalytics had one of the more interesting demonstrations because it was a very different solution than the other three solutions. They have a mobile marketing platform that is used to drive end user engagement and measure that engagement. They also uploaded their Data Mania integration demo here. They mostly are leveraging the data integration to provide access to important data sources that can help drive customer engagement in their platform. Given their end users are more marketing or business analysts they just expect to be able to easily get the data they want and need to drive marketing analysis and engagement.
My takeaway from talking to many of the SaaS vendors was that there is a lot of interest being able to leverage higher level infrastructure, platform and middleware services as they mature to meet the real needs of SaaS vendors so that they can focus on their own solutions. The ecosystem might be more ready in a lot of cases than what is available.
I won’t say I’ve seen it all; I’ve only scratched the surface in the past 15 years. Below are some of the mistakes I’ve made or fixed during this time.
MongoDB as your Big Data platform
Ask yourself, why am I picking on MongoDB? The NoSQL database most abused at this point is MongoDB, while Mongo has an aggregation framework that tastes like MapReduce and even a very poorly documented Hadoop connector, its sweet spot is as an operational database, not an analytical system.
RDBMS schema as files
You dumped each table from your RDBMS into a file and stored that on HDFS, you now plan to use Hive on it. You know that Hive is slower than RDBMS; it’ll use MapReduce even for a simple select. Next, let’s look at row sizes; you have flat files measured in single-digit kilobytes.
Hadoop does best on large sets of relatively flat data. I’m sure you can create an extract that’s more de-normalized.
Instead of creating a single Data Lake, you created a series of data ponds or a data swamp. Conway’s law has struck again; your business groups have created their own mini-repositories and data analysis processes. That doesn’t sound bad at first, but with different extracts and ways of slicing and dicing the data, you end up with different views of the data, i.e., different answers for some of the same questions.
Schema-on-read doesn’t mean, “Don’t plan at all,” but it means “Don’t plan for every question you might ask.”
Missing use cases
Vendors, to escape the constraints of departmental funding, are selling the idea of the data lake. The byproduct of this is the business lost sight of real use cases. The data-lake approach can be valid, but you won’t get much out of it if you don’t have actual use cases in mind.
It isn’t hard to come up with use cases, but that is always an afterthought. The business should start thinking of the use cases when their databases can’t handle the load.
To do a larger bit of analytics, you may need a bigger tool set like that may include Hive, Pig, MapReduce, R, and more.
As I have shared within the posts of this series, businesses are using analytics to improve their internal and external facing business processes and to strengthen their “right to win” within the markets that they operate. Like healthcare institutions across the country, UPMC is striving to improve its quality of care and business profitability. One educational healthcare CEO put it to me this way–“if we can improve our quality of service, we can reduce costs while we increase our pricing power”. In UPMC’s case, they believe that the vast majority of their costs are in a fraction of their patients, but they want to prove this with real data and then use this information drive their go forward business strategies.
Getting more predictive to improved outcomes and reduce cost
Armed with this knowledge, UPMC’s leadership wanted to use advanced analytic and predictive modeling to improve clinical and financial decision making. And taking this action was seen as producing better patient outcomes and reducing costs. A focus area for analysis involved creating “longitudinal records” for the complete cost of providing particular types of care. For those that aren’t versed in time series analysis, longitudinal analysis uses a series of observations obtained from many respondents over time to derive a relevant business insight. When I was also involved in healthcare, I used this type of analysis to interrelate employee and patient engagement results versus healthcare outcomes. In UPMC’s case, they wanted to use this type of analysis to understand for example the total end to end cost of a spinal surgery. UPMC wanted to look beyond the cost of surgery and account for the pre-surgery care and recovery-related costs. However, to do this for the entire hospital meant that it needed to bring together data from hundreds of sources across UPMC and outside entities, including labs and pharmacies. However, by having this information, UPMC’s leadership saw the potential to create an accurate and comprehensive view which could be used to benchmark future procedures. Additionally, UPMC saw the potential to automate the creation of patient problem lists or examine clinical practice variations. But like the other case studies that we have reviewed, these steps required trustworthy and authoritative data to be accessed with agility and ease.
UPMC’s starts with a large, multiyear investment
In October 2012, UPMC made a $100 million investment to establish an enterprise analytics initiative to bring together for the first time, clinical, financial, administrative, genomic and other information together in one place. Tom Davenport, the author of Competing on Analytics, suggests in his writing that establishing an enterprise analytics capability represents a major step forward because it allows enterprises to answer the big questions, to better tie strategy and analytics, and to finally rationalize applications interconnect and business intelligence spending. As UPMC put its plan together, it realized that it needed to impact more than 1200 applications. As well it realized that it needed one system manage with data integration, master data management, and eventually complex event processing capabilities. At the same time, it created the people side of things by creating a governance team to manage data integrity improvements, ensuring that trusted data populates enterprise analytics and provides transparency into data integrity challenges. One of UPMC’s goals was to provide self-service capabilities. According to Terri Mikol, a project leader, “We can’t have people coming to IT for every information request. We’re never going to cure cancer that way.” Here is an example of the promise that occurred within the first eight months of this project. Researchers were able to integrate—for the first time ever– clinical and genomic information on 140 patients previously treated for breast cancer. Traditionally, these data have resided in separate information systems, making it difficult—if not impossible—to integrate and analyze dozens of variables. The researchers found intriguing molecular differences in the makeup of pre-menopausal vs. post-menopausal breast cancer, findings which will be further explored. For UPMC, this initial cancer insight is just the starting point of their efforts to mine massive amounts of data in the pursuit of smarter medicines.
Building the UPMC Enterprise Analytics Capability
To create their enterprise analytics platform, UPMC determined it was critical to establish “a single, unified platform for data integration, data governance, and master data management,” according to Terri Mikol. The solution required a number of key building blocks. The first was data integration to collect and cleanses data from hundreds of sources and organizes them into repositories that would enable fast, easy analysis and reporting by and for end users.
Specifically, the UPMC enterprise analytics capability pulls clinical and operational data from a broad range of sources, including systems for managing hospital admissions, emergency room operations, patient claims, health plans, electronic health records, as well as external databases that hold registries of genomic and epidemiological data needed for crafting personalized and translational medicine therapies. UPMC has integrated quality checked source data in accordance with industry-standard healthcare information models. This effort included putting together capabilities around data integration, data quality and master data management to manage transformations and enforce consistent definitions of patients, providers, facilities and medical terminology.
As said, the cleansed and harmonized data is organized into specialized genomics databases, multidimensional warehouses, and data marts. The approach makes use of traditional data warehousing approaches as well as big data capabilities to handle unstructured data and natural language processing. UPMC has also deployed analytical tools that allow end users to exploit the data enabled from the Enterprise Analytics platform. The tools drive everything from predictive analytics, cohort tracking, and business and compliance reporting. And UPMC did not stop here. If their data had value then it needed to be secured. UPMC created data audits and data governance practices. As well, they implemented a dynamic data masking solution ensures data security and privacy.
As I have discussed, many firms are pushing point silo solutions into their environments, but as UPMC shows this limits their ability to ask the bigger business questions or in UPMC’s case to discover things that can change people’s live. Analytics are more and more a business enabler if they are organized as an enterprise analytics capability. As well, I have come to believe that analytics have become foundational capability to all firms’ right to win. It informs a coherent set of capabilities and establishes a firm’s go forward right to win. For this, UPMC is a shining example of getting things right.
Author Twitter: @MylesSuer
On March 25th, Josh Lee, Global Director for Insurance Marketing at Informatica and Cindy Maike, General Manager, Insurance at Hortonworks, will be joining the Insurance Journal in a webinar on “How to Become an Analytics Ready Insurer”.
Register for the Webinar on March 25th at 10am Pacific/ 1pm Eastern
Josh and Cindy exchange perspectives on what “analytics ready” really means for insurers, and today we are sharing some of our views (join the webinar to learn more). Josh and Cindy offer perspectives on the five questions posed here. Please join Insurance Journal, Informatica and Hortonworks on March 25th for more on this exciting topic.
See the Hortonworks site for a second posting of this blog and more details on exciting innovations in Big Data.
- What makes a big data environment attractive to an insurer?
CM: Many insurance companies are using new types of data to create innovative products that better meet their customers’ risk needs. For example, we are seeing insurance for “shared vehicles” and new products for prevention services. Much of this innovation is made possible by the rapid growth in sensor and machine data, which the industry incorporates into predictive analytics for risk assessment and claims management.
Customers who buy personal lines of insurance also expect the same type of personalized service and offers they receive from retailers and telecommunication companies. They expect carriers to have a single view of their business that permeates customer experience, claims handling, pricing and product development. Big data in Hadoop makes that single view possible.
JL: Let’s face it, insurance is all about analytics. Better analytics leads to better pricing, reduced risk and better customer service. But here’s the issue. Existing data sources are costly in storing vast amounts of data and inflexible to adapt to changing needs of innovative analytics. Imagine kicking off a simulation or modeling routine one evening only to return in the morning and find it incomplete or lacking data that requires a special request of IT.
This is where big data environments are helping insurers. Larger, more flexible data sets allowing longer series of analytics to be run, generating better results. And imagine doing all that at a fraction of the cost and time of traditional data structures. Oh, and heaven forbid you ask a mainframe to do any of this.
- So we hear a lot about Big Data being great for unstructured data. What about traditional data types that have been used in insurance forever?
CM: Traditional data types are very important to the industry – it drives our regulatory reporting and much of the performance management reporting. This data will continue to play a very important role in the insurance industry and for companies.
However, big data can now enrich that traditional data with new data sources for new insights. In areas such as customer service and product personalization, it can make the difference between cross-selling the right products to meet customer needs and losing the business. For commercial and group carriers, the new data provides the ability to better analyze risk needs, price accordingly and enable superior service in a highly competitive market.
JL: Traditional data will always be around. I doubt that I will outlive a mainframe installation at an insurer; which makes me a little sad. And for many rote tasks like financial reporting, a sales report, or a commission statement, those are sufficient. However, the business of insurance is changing in leaps and bounds. Innovators in data science are interested in correlating those traditional sources to other creative data to find new products, or areas to reduce risk. There is just a lot of data that is either ignored or locked in obscure systems that needs to be brought into the light. This data could be structured or unstructured, it doesn’t matter, and Big Data can assist there.
- How does this fit into an overall data management function?
JL: At the end of the day, a Hadoop cluster is another source of data for an insurer. More flexible, more cost effective and higher speed; but yet another data source for an insurer. So that’s one more on top of relational, cubes, content repositories, mainframes and whatever else insurers have latched onto over the years. So if it wasn’t completely obvious before, it should be now. Data needs to be managed. As data moves around the organization for consumption, it is shaped, cleaned, copied and we hope there is governance in place. And the Big Data installation is not exempt from any of these routines. In fact, one could argue that it is more critical to leverage good data management practices with Big Data not only to optimize the environment but also to eventually replace traditional data structures that just aren’t working.
CM: Insurance companies are blending new and old data and looking for the best ways to leverage “all data”. We are witnessing the development of a new generation of advanced analytical applications to take advantage of the volume, velocity, and variety in big data. We can also enhance current predictive models, enriching them with the unstructured information in claim and underwriting notes or diaries along with other external data.
There will be challenges. Insurance companies will still need to make important decisions on how to incorporate the new data into existing data governance and data management processes. The Chief Data or Chief Analytics officer will need to drive this business change in close partnership with IT.
- Tell me a little bit about how Informatica and Hortonworks are working together on this?
JL: For years Informatica has been helping our clients to realize the value in their data and analytics. And while enjoying great success in partnership with our clients, unlocking the full value of data requires new structures, new storage and something that doesn’t break the bank for our clients. So Informatica and Hortonworks are on a continuing journey to show that value in analytics comes with strong relationships between the Hadoop distribution and innovative market leading data management technology. As the relationship between Informatica and Hortonworks deepens, expect to see even more vertically relevant solutions and documented ROI for the Informatica/Hortonworks solution stack.
CM: Informatica and Hortonworks optimize the entire big data supply chain on Hadoop, turning data into actionable information to drive business value. By incorporating data management services into the data lake, companies can store and process massive amounts of data across a wide variety of channels including social media, clickstream data, server logs, customer transactions and interactions, videos, and sensor data from equipment in the field.
Matching data from internal sources (e.g. very granular data about customers) with external data (e.g. weather data or driving patterns in specific geographic areas) can unlock new revenue streams.
See this video for a discussion on unlocking those new revenue streams. Sanjay Krishnamurthi, Informatica CTO, and Shaun Connolly, Hortonworks VP of Corporate Strategy, share their perspectives.
- Do you have any additional comments on the future of data in this brave new world?
CM: My perspective is that, over time, we will drop the reference to “big” or ”small” data and get back to referring simply to “Data”. The term big data has been useful to describe the growing awareness on how the new data types can help insurance companies grow.
We can no longer use “traditional” methods to gain insights from data. Insurers need a modern data architecture to store, process and analyze data—transforming it into insight.
We will see an increase in new market entrants in the insurance industry, and existing insurance companies will improve their products and services based upon the insights they have gained from their data, regardless of whether that was “big” or “small” data.
JL: I’m sure that even now there is someone locked in their mother’s basement playing video games and trying to come up with the next data storage wave. So we have that to look forward to, and I’m sure it will be cool. But, if we are honest with ourselves, we’ll admit that we really don’t know what to do with half the data that we have. So while data storage structures are critical, the future holds even greater promise for new models, better analytical tools and applications that can make sense of all of this and point insurers in new directions. The trend that won’t change anytime soon is the ongoing need for good quality data, data ready at a moment’s notice, safe and secure and governed in a way that insurers can trust what those cool analytics show them.
Please join us for an interactive discussion on March 25th at 10am Pacific Time/ 1pm Eastern Time.
Register for the Webinar on March 25th at 10am Pacific/ 1pm Eastern
EpicMix is a website, data integration solution and web application that provides a great example of how companies can provide more value to their customers when they think about data-ready architecture. In this case the company is Vail Resorts and it is great to look at this as an IoT case study since the solution has been in use since 2010.
The basics of EpicMix
* RFID technology embedded into lift tickets provide the ability to collect data for anyone using one at any Vail managed. Vail realized they had all these lift tickets being worn and there was an opportunity to use them to collect data that could enhance the experience of their guests. It also is a very clever way to collect data on skiers to help drive segmentation and marketing decisions.
* EpicMix just works. If any guest wants to take advantage all they have to do is register on the website or download the mobile app for their Android or iOS smart phone and register. Having a low bar to use is important to getting people to try out the app and even if people do not use the EpicMix website or app Vail is still able to leverage the data they are generating to better understand what people do on the mountain. (Vail has a detailed information policy and opt out policy)
* Value added features beyond data visibility. What makes the solution more interesting are the features that go beyond just tracking skiing performance. These include private messaging between guests while on the mountain, sharing photos with friends, integration to personal social media accounts and the ability for people to earn badges and participate in challenges. These go beyond the generation one solution that would just track performance and nothing else.
This is the type of solution that qualifies as a IoT Personal Productivity solution and a Business Productivity solution.
- For the skier it provides information on their activity, communication and sharing information on social media.
- For Vail it allows them to better understand their guests, better communicate and offer their guests additional services and benefits and also how to use their resources or deploy their employees.
The EpicMix solution was made possible by taking advantage of data that was not being collected and then making it useful to users (skiers & guests). Having used EpicMix and similar performance tracking solutions the added communication and collaboration features are what sets it apart and the ease of use in getting started make it a great example of how fresh data can come from anywhere.
In the future it is easy to imagine features being added that streamlined ordering services for users (table reservation at the restaurant for Apre-ski) or Vail leveraging the data to make business decisions to provide more real time offers to guests on the mountain or frequent visitors on their next visit. And maybe we will see some of the new ski oriented wearables like XON bindings be integrated to solutions like EpicMix so it is possible to get even more data without having to have a second smart phone application.
Information for this post comes from Mapleton Hill Media and Vail Resorts