What are the Possibilities of Big Data?
First – let’s start off with a description of what exactly Big Data is…simply put: lots and lots of data. According to Wikipedia: “Big data is a broad term for data sets so large or complex that traditional data processing applications are inadequate. Challenges include analysis, capture, curation, search, sharing, storage, transfer, visualization, and information privacy. The term often refers simply to the use of predictive analytics or other certain advanced methods to extract value from data, and seldom to a particular size of data set.”
There are many different sources of data (claims systems, enrollment systems, benefits administration systems, survey results, consumer data, social media, personal health devices – like fitbit). Each source generates an amazing amount of data. These data sets grow in size because they are being gathered by readily available and numerous information-sensing mobile devices, aerial (remote sensing), software logs, cameras, microphones, radio-frequency identification (RFID) readers, and wireless sensor networks. The world’s technological per-capita capacity to store information has roughly doubled every 40 months since the 1980s; as of 2012, every day 2.5 exabytes (2.5×1018) of data were created. In order to make sense of all of this data, we need to be able to organize it, create linkages between the data and then perform analysis on the data in order to provide meaningful actions.
In 2000, Seisint Inc. developed C++ as a distributed file sharing framework for data storage and querying to support the vast amount of storage that is necessary for this data. With this framework, structured, semi-structured and/or unstructured data can be stored and distributed across multiple servers.
In 2004, Google published a paper on a process called MapReduce that uses the distributed file sharing framework. The MapReduce framework provides a parallel processing model and associated implementation to process huge amount of data. With MapReduce, queries are split and distributed across parallel nodes and processed in parallel (the Map step). The results are then gathered and delivered (the Reduce step). The framework was very successful, so others wanted to replicate the algorithm. Therefore, an implementation of the MapReduce framework was adopted by an Apache open source project named Hadoop.
With Hadoop, payers have the ability to store a vast amount of data at a fairly inexpensive price point. By distributing the framework, access to the data can happen in a timely manner and payers are able to interact effectively with their distributed data.
Within the Healthcare Payer market, there are a lot of potential use cases for Hadoop or big data. Once the data is stored, linked and relationships between the data are created – some of the benefits we anticipate include:
- Re-Admission Risk Analysis– One of the key predictors of re-admission rates is whether or not the patient has someone to help them at home. The ability to determine household information (through relationships in member data, for example addresses and care team relationships available within a master data management solution populated with data from a Hadoop cluster) would be very helpful to identify at risk patients and provide targeted care post discharge. Data from social media outlets can provide quite a bit of household information.
- STARS Rating Improvement -In addition to missed care management plans/drug adherence, another interesting thing that could be better aligned is the member/provider link. Perhaps one specific provider is more successful at getting patients to adhere to Diabetes management protocols, while another provider is not very successful at getting hip replacement patients to complete physical therapy. Being able to link the patient to the provider along with the clinical data can help identify where to focus remediation efforts for possibly modifying provider or member behavior.
- Member Engagement -Taking householding further, putting information from re-admission risk analysis to work – once payers are able to household a group of members and link the household to a specific address – payers might be able to better predict how a new member in the same physical location might behave – and then you could target your outreach to the new members from the beginning utilizing effective engagement methodologies that have been successful for that physical location in the past.
In order to create the household, or determine how a member feels about a provider (which can then impact how they adhere to treatment plans) or understand how neighborhoods (which are groupings of households) may engage with their providers, payers need access to a vast amount of data. They also need to be able to sift through this data efficiently to create the relationship links as quickly as possible. Sifting through the data is enabled with Hadoop and Big Data. Relating the data can be done with master data management (which I will talk about next).
Where is the best place to get started on a Big Data solution? The Big, Big Data Workbook addresses:
- How to choose the right big data project and make it bulletproof from the start– setting clear business and IT objectives, defining metrics that prove your project’s value, and being strategic about datasets, tools and hand-coding.
- What to consider when building your team and data governance framework– making the most of existing skills, thinking strategically about the composition of the team, and ensuring effective communication and alignment of the project goals.
- How to ensure your big data supply chain is lean and effective– establishing clear, repeatable, scalable, and continuously improving processes, and a blueprint for building the ideal big data technology and process architecture