Category Archives: Data Warehousing
“If I had my way, I’d fire the statisticians – all of them – they don’t add value”.
Surely not? Why would you fire the very people who were employed to make sense of the vast volumes of manufacturing data and guide future production? But he was right. The problem was at that time data management was so poor that data was simply not available for the statisticians to analyze.
So, perhaps this title should be re-written to be:
Fire your Data Scientists – They Aren’t Able to Add Value.
Although this statement is a bit extreme, the same situation may still exist. Data scientists frequently share frustrations such as:
- “I’m told our data is 60% accurate, which means I can’t trust any of it.”
- “We achieved our goal of an answer within a week by working 24 hours a day.”
- “Each quarter we manually prepare 300 slides to anticipate all questions the CFO may ask.”
- “Fred manually audits 10% of the invoices. When he is on holiday, we just don’t do the audit.”
This is why I think the original quote is so insightful. Value from data is not automatically delivered by hiring a statistician, analyst or data scientist. Even with the latest data mining technology, one person cannot positively influence a business without the proper data to support them.
Most organizations are unfamiliar with the structure required to deliver value from their data. New storage technologies will be introduced and a variety of analytics tools will be tried and tested. This change is crucial for to success. In order for statisticians to add value to a company, they must have access to high quality data that is easily sourced and integrated. That data must be available through the latest analytics technology. This new ecosystem should provide insights that can play a role in future production. Staff will need to be trained, as this new data will be incorporated into daily decision making.
With a rich 20-year history, Informatica understands data ecosystems. Employees become wasted investments when they do not have access to the trusted data they need in order to deliver their true value.
Who wants to spend their time recreating data sets to find a nugget of value only to discover it can’t be implemented?
Build a analytical ecosystem with a balanced focus on all aspects of data management. This will mean that value delivery is limited only by the imagination of your employees. Rather than questioning the value of an analytics team, you will attract some of the best and the brightest. Then, you will finally be able to deliver on the promised value of your data.
The transition to value-based care is well underway. From healthcare delivery organizations to clinicians, payers, and patients, everyone feels the impact. Each has a role to play. Moving to a value-driven model demands agility from people, processes, and technology. Organizations that succeed in this transformation will be those in which:
- Collaboration is commonplace
- Clinicians and business leaders wear new hats
- Data is recognized as an enterprise asset
The ability to leverage data will differentiate the leaders from the followers. Successful healthcare organizations will:
1) Establish analytics as a core competency
2) Rely on data to deliver best practice care
3) Engage patients and collaborate across the ecosystem to foster strong, actionable relationships
Trustworthy data is required to power the analytics that reveal the right answers, to define best practice guidelines and to identify and understand relationships across the ecosystem. In order to advance, data integration must also be agile. The right answers do not live in a single application. Instead, the right answers are revealed by integrating data from across the entire ecosystem. For example, in order to deliver personalized medicine, you must analyze an integrated view of data from numerous sources. These sources could include multiple EMRs, genomic data, data marts, reference data and billing data.
A recent PWC survey showed that 62% of executives believe data integration will become a competitive advantage. However, a July 2013 Information Week survey reported that 40% of healthcare executives gave their organization only a grade D or F on preparedness to manage the data deluge.
What grade would you give your organization?
You can improve your organization’s grade, but it will require collaboration between business and IT. If you are in IT, you’ll need to collaborate with business users who understand the data. You must empower them with self-service tools for improving data quality and connecting data. If you are a business leader, you need to understand and take an active role with the data.
To take the next step, download our new eBook, “Potential Unlocked: Transforming healthcare by putting information to work.” In it, you’ll learn:
- How to put your information to work
- New ways to govern your data
- What other healthcare organizations are doing
- How to overcome common barriers
So go ahead, download it now and let me know what you think. I look forward to hearing your questions and comments….oh, and your grade!
Maybe the word “death” is a bit strong, so let’s say “demise” instead. Recently I read an article in the Harvard Business Review around how Big Data and Data Scientists will rule the world of the 21st century corporation and how they have to operate for maximum value. The thing I found rather disturbing was that it takes a PhD – probably a few of them – in a variety of math areas to give executives the necessary insight to make better decisions ranging from what product to develop next to who to sell it to and where.
Don’t get me wrong – this is mixed news for any enterprise software firm helping businesses locate, acquire, contextually link, understand and distribute high-quality data. The existence of such a high-value role validates product development but it also limits adoption. It is also great news that data has finally gathered the attention it deserves. But I am starting to ask myself why it always takes individuals with a “one-in-a-million” skill set to add value. What happened to the democratization of software? Why is the design starting point for enterprise software not always similar to B2C applications, like an iPhone app, i.e. simpler is better? Why is it always such a gradual “Cold War” evolution instead of a near-instant French Revolution?
Why do development environments for Big Data not accommodate limited or existing skills but always accommodate the most complex scenarios? Well, the answer could be that the first customers will be very large, very complex organizations with super complex problems, which they were unable to solve so far. If analytical apps have become a self-service proposition for business users, data integration should be as well. So why does access to a lot of fast moving and diverse data require scarce PIG or Cassandra developers to get the data into an analyzable shape and a PhD to query and interpret patterns?
I realize new technologies start with a foundation and as they spread supply will attempt to catch up to create an equilibrium. However, this is about a problem, which has existed for decades in many industries, such as the oil & gas, telecommunication, public and retail sector. Whenever I talk to architects and business leaders in these industries, they chuckle at “Big Data” and tell me “yes, we got that – and by the way, we have been dealing with this reality for a long time”. By now I would have expected that the skill (cost) side of turning data into a meaningful insight would have been driven down more significantly.
Informatica has made a tremendous push in this regard with its “Map Once, Deploy Anywhere” paradigm. I cannot wait to see what’s next – and I just saw something recently that got me very excited. Why you ask? Because at some point I would like to have at least a business-super user pummel terabytes of transaction and interaction data into an environment (Hadoop cluster, in memory DB…) and massage it so that his self-created dashboard gets him/her where (s)he needs to go. This should include concepts like; “where is the data I need for this insight?’, “what is missing and how do I get to that piece in the best way?”, “how do I want it to look to share it?” All that is required should be a semi-experienced knowledge of Excel and PowerPoint to get your hands on advanced Big Data analytics. Don’t you think? Do you believe that this role will disappear as quickly as it has surfaced?
As I continue to counsel insurers about master data, they all agree immediately that it is something they need to get their hands around fast. If you ask participants in a workshop at any carrier; no matter if life, p&c, health or excess, they all raise their hands when I ask, “Do you have broadband bundle at home for internet, voice and TV as well as wireless voice and data?”, followed by “Would you want your company to be the insurance version of this?”
Now let me be clear; while communication service providers offer very sophisticated bundles, they are also still grappling with a comprehensive view of a client across all services (data, voice, text, residential, business, international, TV, mobile, etc.) each of their touch points (website, call center, local store). They are also miles away of including any sort of meaningful network data (jitter, dropped calls, failed call setups, etc.)
Similarly, my insurance investigations typically touch most of the frontline consumer (business and personal) contact points including agencies, marketing (incl. CEM & VOC) and the service center. On all these we typically see a significant lack of productivity given that policy, billing, payments and claims systems are service line specific, while supporting functions from developing leads and underwriting to claims adjucation often handle more than one type of claim.
This lack of performance is worsened even more by the fact that campaigns have sub-optimal campaign response and conversion rates. As touchpoint-enabling CRM applications also suffer from a lack of complete or consistent contact preference information, interactions may violate local privacy regulations. In addition, service centers may capture leads only to log them into a black box AS400 policy system to disappear.
Here again we often hear that the fix could just happen by scrubbing data before it goes into the data warehouse. However, the data typically does not sync back to the source systems so any interaction with a client via chat, phone or face-to-face will not have real time, accurate information to execute a flawless transaction.
On the insurance IT side we also see enormous overhead; from scrubbing every database from source via staging to the analytical reporting environment every month or quarter to one-off clean up projects for the next acquired book-of-business. For a mid-sized, regional carrier (ca. $6B net premiums written) we find an average of $13.1 million in annual benefits from a central customer hub. This figure results in a ROI of between 600-900% depending on requirement complexity, distribution model, IT infrastructure and service lines. This number includes some baseline revenue improvements, productivity gains and cost avoidance as well as reduction.
On the health insurance side, my clients have complained about regional data sources contributing incomplete (often driven by local process & law) and incorrect data (name, address, etc.) to untrusted reports from membership, claims and sales data warehouses. This makes budgeting of such items like medical advice lines staffed by nurses, sales compensation planning and even identifying high-risk members (now driven by the Affordable Care Act) a true mission impossible, which makes the life of the pricing teams challenging.
Over in the life insurers category, whole and universal life plans now encounter a situation where high value clients first faced lower than expected yields due to the low interest rate environment on top of front-loaded fees as well as the front loading of the cost of the term component. Now, as bonds are forecast to decrease in value in the near future, publicly traded carriers will likely be forced to sell bonds before maturity to make good on term life commitments and whole life minimum yield commitments to keep policies in force.
This means that insurers need a full profile of clients as they experience life changes like a move, loss of job, a promotion or birth. Such changes require the proper mitigation strategy, which can be employed to protect a baseline of coverage in order to maintain or improve the premium. This can range from splitting term from whole life to using managed investment portfolio yields to temporarily pad premium shortfalls.
Overall, without a true, timely and complete picture of a client and his/her personal and professional relationships over time and what strategies were presented, considered appealing and ultimately put in force, how will margins improve? Surely, social media data can help here but it should be a second step after mastering what is available in-house already. What are some of your experiences how carriers have tried to collect and use core customer data?
Recommendations and illustrations contained in this post are estimates only and are based entirely upon information provided by the prospective customer and on our observations. While we believe our recommendations and estimates to be sound, the degree of success achieved by the prospective customer is dependent upon a variety of factors, many of which are not under Informatica’s control and nothing in this post shall be relied upon as representative of the degree of success that may, in fact, be realized and no warrantee or representation of success, either express or implied, is made.
Unlike some of my friends, History was a subject in high school and college that I truly enjoyed. I particularly appreciated biographies of favorite historical figures because it painted a human face and gave meaning and color to the past. I also vowed at that time to navigate my life and future under the principle attributed to Harvard professor Jorge Agustín Nicolás Ruiz de Santayana y Borrás that goes, “Those who cannot remember the past are condemned to repeat it.”
So that’s a little ditty regarding my history regarding history.
Forwarding now to the present in which I have carved out my career in technology, and in particular, enterprise software, I’m afforded a great platform where I talk to lots of IT and business leaders. When I do, I usually ask them, “How are you implementing advanced projects that help the business become more agile or effective or opportunistically proactive?” They usually answer something along the lines of “this is the age and renaissance of data science and analytics” and then end up talking exclusively about their meat and potatoes business intelligence software projects and how 300 reports now run their business.
Then when I probe and hear their answer more in depth, I am once again reminded of THE history quote and think to myself there’s an amusing irony at play here. When I think about the Business Intelligence systems of today, most are designed to “remember” and report on the historical past through large data warehouses of a gazillion transactions, along with basic, but numerous shipping and billing histories and maybe assorted support records.
But when it comes right down to it, business intelligence “history” is still just that. Nothing is really learned and applied right when and where it counted – AND when it would have made all the difference had the company been able to react in time.
So, in essence, by using standalone BI systems as they are designed today, companies are indeed condemned to repeat what they have already learned because they are too late – so the same mistakes will be repeated again and again.
This means the challenge for BI is to reduce latency, measure the pertinent data / sensors / events, and get scalable – extremely scalable and flexible enough to handle the volume and variety of the forthcoming data onslaught.
There’s a part 2 to this story so keep an eye out for my next blog post History Repeats Itself (Part 2)
Data is everywhere. It’s in databases and applications spread across your enterprise. It’s in the hands of your customers and partners. It’s in cloud applications and cloud servers. It’s on spreadsheets and documents on your employee’s laptops and tablets. It’s in smartphones, sensors and GPS devices. It’s in the blogosphere, the twittersphere and your friends’ Facebook timelines. (more…)
Data warehouses tend to grow very quickly because they integrate data from multiple sources and maintain years of historical data for analytics. A number of our customers have data warehouses in the hundreds of terabytes to petabytes range. Managing such a large amount of data becomes a challenge. How do you curb runaway costs in such an environment? Completing maintenance tasks within the prescribed window and ensuring acceptable performance are also big challenges.
We have provided best practices to archive aged data from data warehouses. Archiving data will keep the production data size at almost a constant level, reducing infrastructure and maintenance costs, while keeping performance up. At the same time, you can still access the archived data directly if you really need to from any reporting tool. Yet many are loath to move data out of their production system. This year, at Informatica World, we’re going to discuss another method of managing data growth without moving data out of the production data warehouse. I’m not going to tell you what this new method is, yet. You’ll have to come and learn more about it at my breakout session at Informatica World: What’s New from Informatica to Improve Data Warehouse Performance and Lower Costs.
I look forward to seeing all of you at Aria, Las Vegas next month. Also, I am especially excited to see our ILM customers at our second Product Advisory Council again this year.
Join us this year at Informatica World!
We have a great line up of speakers and events to help you become a data driven healthcare organization… I’ve provided a few highlights below:
Participate in the Informatica World Keynote sessions with Sohaib Abbasi and Rick Smolan who wrote “The Human Face of Big Data” — learn more via this quick YouTube video: http://www.youtube.com/watch?v=7K5d9ArRLJE&feature=player_embedded
With more than 100 interactive and in-depth breakout sessions, spanning 6 different tracks, (Platform & Products, Architecture, Best Practices, Big Data, Hybrid IT and Tech Talk), Informatica World is an excellent way to ensure you are getting the most from your Informatica investment. Learn best practices from organizations who are realizing the potential of their data like: Ochsner Health, Sutter Health, UMass Memorial, Qualcomm and Paypal.
Finally, we want you to balance work with a little play… we invite you to network with industry peers at our Healthcare Cocktail Reception on the evening of Wednesday, June 5th and again during our Data Driven Healthcare Breakfast Roundtable on Thursday, June 6th.
See you there!
The data warehouse’s goal is timely delivery of trusted data to support decision-enabling insights. However, it’s difficult to get insights out of an environment that’s hard to see inside of. This is why, as much as is possible given the necessities of data privacy, a data warehouse should be turned into a glass house, allowing us to see data quality and business intelligence challenges as they truly are.
Trusted data is not perfect data. Trusted data is transparent data, honest about its imperfections, and realistic about the practical trade-offs between delivery and quality. You can’t fix what you can’t see, but even more important, concealing or ignoring known data quality issues is only going to decrease business users’ trust of the data warehouse. Perfect data is impossible, but the more control enforced wherever data originates, and the more monitoring performed wherever data flows, the better overall data quality will be in the warehouse. (more…)