Category Archives: Data Warehousing
In 2012, Forbes published an article predicting an upcoming problem.
The Need for Scalable Enterprise Analytics
Specifically, increased exploration in Big Data opportunities would place pressure on the typical corporate infrastructure. The generic hardware used to run most tech industry enterprise applications was not designed to handle real-time data processing. As a result, the explosion of mobile usages, and the proliferation of social networks, was increasing the strain on the system. Most companies now faced real-time processing requirements beyond what the traditional model was designed to handle.
In the past two years, the volume of data and speed of data growth has grown significantly. As a result, the problem has become more severe. It is now clear that these challenges can’t be overcome by simply doubling or tripling their IT spending on infrastructure sprawl. Today, enterprises seek consolidated solutions that offer scalability, performance and ease of administration. The present need is for scalable enterprise analytics.
A Clear Solution Is Available
Informatica PowerCenter and Data Quality is the market leading data integration and data quality platform. This platform has now been certified by Oracle as an optimal solution for both the Oracle Exadata Database Machine and the Oracle SuperCluster.
As the high-speed on-ramp for data into Oracle Exadata, PowerCenter and Data Quality deliver up-to five times faster performance on data load, query, profiling and cleansing tasks. Informatica’s data integration customers can now easily reuse data integration code, skills and resources to access and transform any data from any data source and load it into Exadata, with the highest throughput and scalability.
Customers adopting Oracle Exadata for high-volume, high-speed analytics can now be confident with Informatica PowerCenter and Data Quality. With these products, they can ingest, cleanse and transform all types of data into Exadata with the highest performance and scale required to maximize the value of their Exadata investment.
Proving the Value of Scalable Enterprise Analytics
In order to demonstrate the efficacy of their partnership, the two companies worked together on a Proof Of Value (POV) project. The goal is to prove that using PowerCenter with Exadata would improve both performance and scalability. The project involved PowerCenter and Data Quality 9.6.1 and x4-2 Exadata Machine. Oracle 11g was considered for both standard Oracle and Exadata versions.
The first test conducted a 1TB load test to Exadata and standard Oracle in a typical PowerCenter use case. The second test consisted of querying 1TB profiling warehouse database in Data Quality use case scenario. Performance data was collected for both tests. The scalability factor was also captured. A variant of the TPCH dataset was used to generate the test data. The results were significantly higher than prior Exabyte 1TB test. In particular:
- The data query tests achieved 5x performance.
- The data load tests achieved a 3x-5x speed increase.
- Linear scalability was achieved with read/write tests on Exadata.
What Business Benefits Could You Expect?
Informatica PowerCenter and Data Quality, along-with Oracle Exadata, now provide the best-of-breed combination of software and hardware, optimized to deliver the highest possible total system performance. These comprehensive tools drive agile reporting and analytics, while empowering IT organizations to meet SLAs and quality goals like never before.
- Extend Oracle Exadata’s access to even more business critical data sources. Utilize optimized out-of-the-box Informatica connectivity to easily access hundreds of data sources, including all the major databases, on-premise and cloud applications, mainframe, social data and Hadoop.
- Get more data, more quickly into Oracle Exadata. Move higher volumes of trusted data quickly into Exadata to support timely reporting with up-to-date information (i.e. up to 5x performance improvement compared to Oracle database).
- Centralize management and improve insight into large scale data warehouses. Deliver the necessary insights to stakeholders with intuitive data lineage and a collaborative business glossary. Contribute to high quality business analytics, in a timely manner across the enterprise.
- Instantly re-direct workloads and resources to Oracle Exadata without compromising performance. Leverage existing code and programming skills to execute high-performance data integration directly on Exadata by performing push down optimization.
- Roll-out data integration projects faster and more cost-effectively. Customers can now leverage thousands of Informatica certified developers to execute existing data integration and quality transformations directly on Oracle Exadata, without any additional coding.
- Efficiently scale-up and scale-out. Customers can now maximize performance and lower the costs of data integration and quality operations of any scale by performing Informatica workload and push down optimization on Oracle Exadata.
- Save significant costs involved in administration and expansion. Customers can now easily and economically manage large-scale analytics data warehousing environments with a single point of administration and control, and consolidate a multitude of servers on one rack.
- Reduce risk. Customers can now leverage Informatica’s data integration and quality platform to overcome the typical performance and scalability limitations seen in databases and data storage systems. This will help reduce quality-of-service risks as data volumes rise.
Oracle Exadata is a well-engineered system that offers customers out-of-box scalability and performance on demand. Informatica PowerCenter and Data Quality are optimized to run on Exadata, offering customers business benefits that speed up data integration and data quality tasks like never before. Informatica’s certified, optimized, and purpose-built solutions for Oracle can help you enable more timely and trustworthy reporting. You can now benefit from Informatica’s optimized solutions for Oracle Exadata to make better business decisions by unlocking the full potential of the most current and complete enterprise data available. As shown in our test results, you can attain up to 5x performance by scaling Exadata. Informatica Data Quality customers can perform profiling 1TB datasets, which is unheard before. We urge you to deploy the combined solution to solve your data integration and quality problems today while achieving high speed business analytics in these days of big data exploration and Internet Of Things.
Listen to what Ash Kulkarni, SVP, at OOW14 has to say on how @InformaticaCORP PowerCenter and Data Quality certified by Oracle as optimized for Exadata can deliver up-to five times faster performance improvement on data load, query, profiling, cleansing and mastering tasks, for Exadata.
This post was written by guest author Dale Kim, Director of Industry Solutions at MapR Technologies, a valued Informatica partner that provides a distribution for Apache Hadoop that ensures production success for its customers.
Apache Hadoop is growing in popularity as the foundation for an enterprise data hub. An Enterprise Data Hub (EDH) extends and optimizes the traditional data warehouse model by adding complementary big data technologies. It focuses your data warehouse on high value data by reallocating less frequently used data to an alternative platform. It also aggregates data from previously untapped sources to give you a more complete picture of data.
So you have your data, your warehouses, your analytical tools, your Informatica products, and you want to deploy an EDH… now what about Hadoop?
Requirements for Hadoop in an Enterprise Data Hub
Let’s look at characteristics required to meet your EDH needs for a production environment:
You already expect these from your existing enterprise deployments. Shouldn’t you hold Hadoop to the same standards? Let’s discuss each topic:
Enterprise-grade is about the features that keep a system running, i.e., high availability (HA), disaster recovery (DR), and data protection. HA helps a system run even when components (e.g., computers, routers, power supplies) fail. In Hadoop, this means no downtime and no data loss, but also no work loss. If a node fails, you still want jobs to run to completion. DR with remote replication or mirroring guards against site-wide disasters. Mirroring needs to be consistent to ensure recovery to a known state. Using file copy tools won’t cut it. And data protection, using snapshots, lets you recover from data corruption, especially from user or application errors. As with DR replicas, snapshots must be consistent, in that they must reflect the state of the data at the time the snapshot was taken. Not all Hadoop distributions can offer this guarantee.
Hadoop interoperability is an obvious necessity. Features like a POSIX-compliant, NFS-accessible file system let you reuse existing, file system-based applications on Hadoop data. Support for existing tools lets your developers get up to speed quickly. And integration with REST APIs enables easy, open connectivity with other systems.
You should be able to logically divide clusters to support different use cases, job types, user group, and administrators as needed. To avoid a complex, multi-cluster setup, choose a Hadoop distribution with multi-tenancy capabilities to simplify the architecture. This gives you less risk for error and no data/effort duplication.
Security should be a priority to protect against the exposure of confidential data. You should assess how you’ll handle authentication (with or without Kerberos), authorization (access controls), over-the-network encryption, and auditing. Many of these features should be native to your Hadoop distribution, and there are also strong security vendors that provide technologies for securing Hadoop.
Any large scale deployment needs fast read, write, and update capabilities. Hadoop can support the operational requirements of an EDH with integrated, in-Hadoop databases like Apache HBase™ and Accumulo™, as well as MapR-DB (the MapR NoSQL database). This in-Hadoop model helps to simplify the overall EDH architecture.
Using Hadoop as a foundation for an EDH is a powerful option for businesses. Choosing the correct Hadoop distribution is the key to deploying a successful EDH. Be sure not to take shortcuts – especially in a production environment – as you will want to hold your Hadoop platform to the same high expectations you have of your existing enterprise systems.
How are they accomplishing this? A new generation of hackers has learned to reverse engineer popular software programs (e.g. Windows, Outlook Java, etc.) in order to find so called “holes”. Once those holes are exploited, the hackers develop “bugs” that infiltrate computer systems, search for sensitive data and return it to the bad guys. These bugs are then sold in the black market to the highest bidder. When successful, these hackers can wreak havoc across the globe.
I recently read a Time Magazine article titled “World War Zero: How Hackers Fight to Steal Your Secrets.” The article discussed a new generation of software companies made up of former hackers. These firms help other software companies by identifying potential security holes, before they can be used in malicious exploits.
This constant battle between good (data and software security firms) and bad (smart, young, programmers looking to make a quick/big buck) is happening every day. Unfortunately, the average consumer (you and I) are the innocent victims of this crazy and costly war. As a consumer in today’s digital and data-centric age, I worry when I see these headlines of ongoing data breaches from the Targets of the world to my local bank down the street. I wonder not “if” but “when” I will become the next victim. According to the Ponemon institute, the average cost to a company was $3.5 million in US dollars and 15 percent more than what it cost last year.
As a 20 year software industry veteran, I’ve worked with many firms across global financial services industry. As a result, my concerned about data security exceed those of the average consumer. Here are the reasons for this:
- Everything is Digital: I remember the days when ATM machines were introduced, eliminating the need to wait in long teller lines. Nowadays, most of what we do with our financial institutions is digital and online whether on our mobile devices to desktop browsers. As such every interaction and transaction is creating sensitive data that gets disbursed across tens, hundreds, sometimes thousands of databases and systems in these firms.
- The Big Data Phenomenon: I’m not talking about sexy next generation analytic applications that promise to provide the best answer to run your business. What I am talking about is the volume of data that is being generated and collected from the countless number of computer systems (on-premise and in the cloud) that run today’s global financial services industry.
- Increase use of Off-Shore and On-Shore Development: Outsourcing technology projects to offshore development firms has be leverage off shore development partners to offset their operational and technology costs. With new technology initiatives.
Now here is the hard part. Given these trends and heightened threats, do the companies I do business with know where the data resides that they need to protect? How do they actually protect sensitive data when using it to support new IT projects both in-house or by off-shore development partners? You’d be amazed what the truth is.
According to the recent Ponemon Institute study “State of Data Centric Security” that surveyed 1,587 Global IT and IT security practitioners in 16 countries:
- Only 16 percent of the respondents believe they know where all sensitive structured data is located and a very small percentage (7 percent) know where unstructured data resides.
- Fifty-seven percent of respondents say not knowing where the organization’s sensitive or confidential data is located keeps them up at night.
- Only 19 percent say their organizations use centralized access control management and entitlements and 14 percent use file system and access audits.
Even worse, those surveyed said that not knowing where sensitive and confidential information resides is a serious threat and the percentage of respondents who believe it is a high priority in their organizations. Seventy-nine percent of respondents agree it is a significant security risk facing their organizations. But a much smaller percentage (51 percent) believes that securing and/or protecting data is a high priority in their organizations.
I don’t know about you but this is alarming and worrisome to me. I think I am ready to reach out to my banker and my local retailer and let him know about my concerns and make sure they ask and communicate my concerns to the top of their organization. In today’s globally and socially connected world, news travels fast and given how hard it is to build trustful customer relationships, one would think every business from the local mall to Wall St should be asking if they are doing what they need to identify and protect their number one digital asset – Their data.
This creative thinking to solve a problem came from a request to build a soldier knife from the Swiss Army. In the end, the solution was all about getting the right tool for the right job in the right place. In many cases soldiers didn’t need industrial strength tools, all they really needed was a compact and lightweight tool to get the job at hand done quickly.
Putting this into perspective with today’s world of Data Integration, using enterprise-class data integration tools for the smaller data integration project is over kill and typically out of reach for the smaller organization. However, these smaller data integration projects are just as important as those larger enterprise projects, and they are often the innovation behind a new way of business thinking. The traditional hand-coding approach to addressing the smaller data integration project is not-scalable, not-repeatable and prone to human error, what’s needed is a compact, flexible and powerful off-the-shelf tool.
Thankfully, over a century after the world embraced the Swiss Army Knife, someone at Informatica was paying attention to revolutionary ideas. If you’ve not yet heard the news about the Informatica platform, a version called PowerCenter Express has been released and it is free of charge so you can use it to handle an assortment of what I’d characterize as high complexity / low volume data integration challenges and experience a subset of the Informatica platform for yourself. I’d emphasize that PowerCenter Express doesn’t replace the need for Informatica’s enterprise grade products, but it is ideal for rapid prototyping, profiling data, and developing quick proof of concepts.
PowerCenter Express provides a glimpse of the evolving Informatica platform by integrating four Informatica products into a single, compact tool. There are no database dependencies and the product installs in just under 10 minutes. Much to my own surprise, I use PowerCenter express quite often going about the various aspects of my job with Informatica. I have it installed on my laptop so it travels with me wherever I go. It starts up quickly so it’s ideal for getting a little work done on an airplane.
For example, recently I wanted to explore building some rules for an upcoming proof of concept on a plane ride home so I could claw back some personal time for my weekend. I used PowerCenter Express to profile some data and create a mapping. And this mapping wasn’t something I needed to throw away and recreate in an enterprise version after my flight landed. Vibe, Informatica’s build once / run anywhere metadata driven architecture allows me to export a mapping I create in PowerCenter Express to one of the enterprise versions of Informatica’s products such as PowerCenter, DataQuality or Informatica Cloud.
As I alluded to earlier in this article, being a free offering I honestly didn’t expect too much from PowerCenter Express when I first started exploring it. However, due to my own positive experiences, I now like to think of PowerCenter Express as the Swiss Army Knife of Data Integration.
To start claiming back some of your personal time, get started with the free version of PowerCenter Express, found on the Informatica Marketplace at: https://community.informatica.com/solutions/pcexpress
Getting started with Cloud Data Warehousing using Amazon Redshift is now easier than ever, thanks to the Informatica Cloud’s 60-day trial for Amazon Redshift. Now, anyone can easily and quickly move data from any on-premise, cloud, Big Data, or relational data sources into Amazon Redshift without writing a single line of code and without being a data integration expert. You can use Informatica Cloud’s six-step wizard to quickly replicate your data or use the productivity-enhancing cloud integration designer to tackle more advanced use cases, such as combining multiple data sources into one Amazon Redshift table. Existing Informatica PowerCenter users can use Informatica Cloud and Amazon Redshift to extend an existing data warehouse with through an affordable and scalable approach. If you are currently exploring self-service business intelligence solutions such as Birst, Tableau, or Microstrategy, the combination of Redshift and Informatica Cloud makes it incredibly easy to prepare the data for analytics by any BI solution.
To get started, execute the following steps:
- Go to http://informaticacloud.com/cloud-trial-for-redshift and click on the ‘Sign Up Now’ link
- You’ll be taken to the Informatica Marketplace listing for the Amazon Redshift trial. Sign up for a Marketplace account if you don’t already have one, and then click on the ‘Start Free Trial Now’ button
- You’ll then be prompted to login with your Informatica Cloud account. If you do not have an Informatica Cloud username and password, register one by clicking the appropriate link and fill in the required details
- Once you finish registration and obtain your login details, download the Vibe ™ Secure Agent to your Amazon EC2 virtual machine (or to a local Windows or Linux instance), and ensure that it can access your Amazon S3 bucket and Amazon Redshift cluster.
- Ensure that your S3 bucket, and Redshift cluster are both in the same availability zone
- To start using the Informatica Cloud connector for Amazon Redshift, create a connection to your Amazon Redshift nodes by providing your AWS Access Key ID and Secret Access Key, specifying your cluster details, and obtaining your JDBC URL string.
You are now ready to begin moving data to and from Amazon Redshift by creating your first Data Synchronization task (available under Applications). Pick a source, pick your Redshift target, map the fields, and you’re done!
The value of using Informatica Cloud to load data into Amazon Redshift is the ability of the application to move massive amounts of data in parallel. The Informatica engine optimizes by moving processing close to where the data is using push-down technology. Unlike other data integration solutions for Redshift that perform batch processing using an XML engine which is inherently slow when processing large data volumes and don’t have multitenant architectures that scale well, Informatica Cloud processes over 2 billion transactions every day.
Amazon Redshift has brought agility, scalability, and affordability to petabyte-scale data warehousing, and Informatica Cloud has made it easy to transfer all your structured and unstructured data into Redshift so you can focus on getting data insights today, not weeks from now.
It is troublesome to me to repeatedly get into conversations with IT managers who want to fix data “for the sake of fixing it”. While this is presumably increasingly rare, due to my department’s role, we probably see a higher occurrence than the normal software vendor employee. Given that, please excuse the inflammatory title of this post.
Nevertheless, once the deal is done, we find increasingly fewer of these instances, yet still enough, as the average implementation consultant or developer cares about this aspect even less. A few months ago a petrochemical firm’s G&G IT team lead told me that he does not believe that data quality improvements can or should be measured. He also said, “if we need another application, we buy it. End of story.” Good for software vendors, I thought, but in most organizations $1M here or there do not lay around leisurely plus decision makers want to see the – dare I say it – ROI.
However, IT and business leaders should take note that a misalignment due to lack OR disregard of communication is a critical success factor. If the business does not get what it needs and wants AND it differs what Corporate IT is envisioning and working on – and this is what I am talking about here – it makes any IT investment a risky proposition.
Let me illustrate this with 4 recent examples I ran into:
1. Potential for flawed prioritization
A retail customer’s IT department apparently knew that fixing and enriching a customer loyalty record across the enterprise is a good and financially rewarding idea. They only wanted to understand what the less-risky functional implementation choices where. They indicated that if they wanted to learn what the factual financial impact of “fixing” certain records or attributes, they would just have to look into their enterprise data warehouse. This is where the logic falls apart as the warehouse would be just as unreliable as the “compromised” applications (POS, mktg, ERP) feeding it.
Even if they massaged the data before it hit the next EDW load, there is nothing inherently real-time about this as all OLTP are running processes of incorrect (no bidirectional linkage) and stale data (since the last load).
I would question if the business is now completely aligned with what IT is continuously correcting. After all, IT may go for the “easy or obvious” fixes via a weekly or monthly recurring data scrub exercise without truly knowing, which the “biggest bang for the buck” is or what the other affected business use cases are, they may not even be aware of yet. Imagine the productivity impact of all the roundtripping and delay in reporting this creates. This example also reminds me of a telco client, I encountered during my tenure at another tech firm, which fed their customer master from their EDW and now just found out that this pattern is doomed to fail due to data staleness and performance.
2. Fix IT issues and business benefits will trickle down
Client number two is a large North American construction Company. An architect built a business case for fixing a variety of data buckets in the organization (CRM, Brand Management, Partner Onboarding, Mobility Services, Quotation & Requisitions, BI & EPM).
Grand vision documents existed and linked to the case, which stated how data would get better (like a sick patient) but there was no mention of hard facts of how each of the use cases would deliver on this. After I gave him some major counseling what to look out and how to flesh it out – radio silence. Someone got scared of the math, I guess.
3. Now that we bought it, where do we start
The third culprit was a large petrochemical firm, which apparently sat on some excess funds and thought (rightfully so) it was a good idea to fix their well attributes. More power to them. However, the IT team is now in a dreadful position having to justify to their boss and ultimately the E&P division head why they prioritized this effort so highly and spent the money. Well, they had their heart in the right place but are a tad late. Still, I consider this better late than never.
4. A senior moment
The last example comes from a South American communications provider. They seemingly did everything right given the results they achieved to date. This gets to show that misalignment of IT and business does not necessarily wreak havoc – at least initially.
However, they are now in phase 3 of their roll out and reality caught up with them. A senior moment or lapse in judgment maybe? Whatever it was; once they fixed their CRM, network and billing application data, they had to start talking to the business and financial analysts as complaints and questions started to trickle in. Once again, better late than never.
So what is the take-away from these stories. Why wait until phase 3, why have to be forced to cram some justification after the purchase? You pick, which one works best for you to fix this age-old issue. But please heed Sohaib’s words of wisdom recently broadcast on CNN Money “IT is a mature sector post bubble…..now it needs to deliver the goods”. And here is an action item for you – check out the new way for the business user to prepare their own data (30 minutes into the video!). Agreed?
“If I had my way, I’d fire the statisticians – all of them – they don’t add value”.
Surely not? Why would you fire the very people who were employed to make sense of the vast volumes of manufacturing data and guide future production? But he was right. The problem was at that time data management was so poor that data was simply not available for the statisticians to analyze.
So, perhaps this title should be re-written to be:
Fire your Data Scientists – They Aren’t Able to Add Value.
Although this statement is a bit extreme, the same situation may still exist. Data scientists frequently share frustrations such as:
- “I’m told our data is 60% accurate, which means I can’t trust any of it.”
- “We achieved our goal of an answer within a week by working 24 hours a day.”
- “Each quarter we manually prepare 300 slides to anticipate all questions the CFO may ask.”
- “Fred manually audits 10% of the invoices. When he is on holiday, we just don’t do the audit.”
This is why I think the original quote is so insightful. Value from data is not automatically delivered by hiring a statistician, analyst or data scientist. Even with the latest data mining technology, one person cannot positively influence a business without the proper data to support them.
Most organizations are unfamiliar with the structure required to deliver value from their data. New storage technologies will be introduced and a variety of analytics tools will be tried and tested. This change is crucial for to success. In order for statisticians to add value to a company, they must have access to high quality data that is easily sourced and integrated. That data must be available through the latest analytics technology. This new ecosystem should provide insights that can play a role in future production. Staff will need to be trained, as this new data will be incorporated into daily decision making.
With a rich 20-year history, Informatica understands data ecosystems. Employees become wasted investments when they do not have access to the trusted data they need in order to deliver their true value.
Who wants to spend their time recreating data sets to find a nugget of value only to discover it can’t be implemented?
Build a analytical ecosystem with a balanced focus on all aspects of data management. This will mean that value delivery is limited only by the imagination of your employees. Rather than questioning the value of an analytics team, you will attract some of the best and the brightest. Then, you will finally be able to deliver on the promised value of your data.
The transition to value-based care is well underway. From healthcare delivery organizations to clinicians, payers, and patients, everyone feels the impact. Each has a role to play. Moving to a value-driven model demands agility from people, processes, and technology. Organizations that succeed in this transformation will be those in which:
- Collaboration is commonplace
- Clinicians and business leaders wear new hats
- Data is recognized as an enterprise asset
The ability to leverage data will differentiate the leaders from the followers. Successful healthcare organizations will:
1) Establish analytics as a core competency
2) Rely on data to deliver best practice care
3) Engage patients and collaborate across the ecosystem to foster strong, actionable relationships
Trustworthy data is required to power the analytics that reveal the right answers, to define best practice guidelines and to identify and understand relationships across the ecosystem. In order to advance, data integration must also be agile. The right answers do not live in a single application. Instead, the right answers are revealed by integrating data from across the entire ecosystem. For example, in order to deliver personalized medicine, you must analyze an integrated view of data from numerous sources. These sources could include multiple EMRs, genomic data, data marts, reference data and billing data.
A recent PWC survey showed that 62% of executives believe data integration will become a competitive advantage. However, a July 2013 Information Week survey reported that 40% of healthcare executives gave their organization only a grade D or F on preparedness to manage the data deluge.
What grade would you give your organization?
You can improve your organization’s grade, but it will require collaboration between business and IT. If you are in IT, you’ll need to collaborate with business users who understand the data. You must empower them with self-service tools for improving data quality and connecting data. If you are a business leader, you need to understand and take an active role with the data.
To take the next step, download our new eBook, “Potential Unlocked: Transforming healthcare by putting information to work.” In it, you’ll learn:
- How to put your information to work
- New ways to govern your data
- What other healthcare organizations are doing
- How to overcome common barriers
So go ahead, download it now and let me know what you think. I look forward to hearing your questions and comments….oh, and your grade!
Maybe the word “death” is a bit strong, so let’s say “demise” instead. Recently I read an article in the Harvard Business Review around how Big Data and Data Scientists will rule the world of the 21st century corporation and how they have to operate for maximum value. The thing I found rather disturbing was that it takes a PhD – probably a few of them – in a variety of math areas to give executives the necessary insight to make better decisions ranging from what product to develop next to who to sell it to and where.
Don’t get me wrong – this is mixed news for any enterprise software firm helping businesses locate, acquire, contextually link, understand and distribute high-quality data. The existence of such a high-value role validates product development but it also limits adoption. It is also great news that data has finally gathered the attention it deserves. But I am starting to ask myself why it always takes individuals with a “one-in-a-million” skill set to add value. What happened to the democratization of software? Why is the design starting point for enterprise software not always similar to B2C applications, like an iPhone app, i.e. simpler is better? Why is it always such a gradual “Cold War” evolution instead of a near-instant French Revolution?
Why do development environments for Big Data not accommodate limited or existing skills but always accommodate the most complex scenarios? Well, the answer could be that the first customers will be very large, very complex organizations with super complex problems, which they were unable to solve so far. If analytical apps have become a self-service proposition for business users, data integration should be as well. So why does access to a lot of fast moving and diverse data require scarce PIG or Cassandra developers to get the data into an analyzable shape and a PhD to query and interpret patterns?
I realize new technologies start with a foundation and as they spread supply will attempt to catch up to create an equilibrium. However, this is about a problem, which has existed for decades in many industries, such as the oil & gas, telecommunication, public and retail sector. Whenever I talk to architects and business leaders in these industries, they chuckle at “Big Data” and tell me “yes, we got that – and by the way, we have been dealing with this reality for a long time”. By now I would have expected that the skill (cost) side of turning data into a meaningful insight would have been driven down more significantly.
Informatica has made a tremendous push in this regard with its “Map Once, Deploy Anywhere” paradigm. I cannot wait to see what’s next – and I just saw something recently that got me very excited. Why you ask? Because at some point I would like to have at least a business-super user pummel terabytes of transaction and interaction data into an environment (Hadoop cluster, in memory DB…) and massage it so that his self-created dashboard gets him/her where (s)he needs to go. This should include concepts like; “where is the data I need for this insight?’, “what is missing and how do I get to that piece in the best way?”, “how do I want it to look to share it?” All that is required should be a semi-experienced knowledge of Excel and PowerPoint to get your hands on advanced Big Data analytics. Don’t you think? Do you believe that this role will disappear as quickly as it has surfaced?
As I continue to counsel insurers about master data, they all agree immediately that it is something they need to get their hands around fast. If you ask participants in a workshop at any carrier; no matter if life, p&c, health or excess, they all raise their hands when I ask, “Do you have broadband bundle at home for internet, voice and TV as well as wireless voice and data?”, followed by “Would you want your company to be the insurance version of this?”
Now let me be clear; while communication service providers offer very sophisticated bundles, they are also still grappling with a comprehensive view of a client across all services (data, voice, text, residential, business, international, TV, mobile, etc.) each of their touch points (website, call center, local store). They are also miles away of including any sort of meaningful network data (jitter, dropped calls, failed call setups, etc.)
Similarly, my insurance investigations typically touch most of the frontline consumer (business and personal) contact points including agencies, marketing (incl. CEM & VOC) and the service center. On all these we typically see a significant lack of productivity given that policy, billing, payments and claims systems are service line specific, while supporting functions from developing leads and underwriting to claims adjucation often handle more than one type of claim.
This lack of performance is worsened even more by the fact that campaigns have sub-optimal campaign response and conversion rates. As touchpoint-enabling CRM applications also suffer from a lack of complete or consistent contact preference information, interactions may violate local privacy regulations. In addition, service centers may capture leads only to log them into a black box AS400 policy system to disappear.
Here again we often hear that the fix could just happen by scrubbing data before it goes into the data warehouse. However, the data typically does not sync back to the source systems so any interaction with a client via chat, phone or face-to-face will not have real time, accurate information to execute a flawless transaction.
On the insurance IT side we also see enormous overhead; from scrubbing every database from source via staging to the analytical reporting environment every month or quarter to one-off clean up projects for the next acquired book-of-business. For a mid-sized, regional carrier (ca. $6B net premiums written) we find an average of $13.1 million in annual benefits from a central customer hub. This figure results in a ROI of between 600-900% depending on requirement complexity, distribution model, IT infrastructure and service lines. This number includes some baseline revenue improvements, productivity gains and cost avoidance as well as reduction.
On the health insurance side, my clients have complained about regional data sources contributing incomplete (often driven by local process & law) and incorrect data (name, address, etc.) to untrusted reports from membership, claims and sales data warehouses. This makes budgeting of such items like medical advice lines staffed by nurses, sales compensation planning and even identifying high-risk members (now driven by the Affordable Care Act) a true mission impossible, which makes the life of the pricing teams challenging.
Over in the life insurers category, whole and universal life plans now encounter a situation where high value clients first faced lower than expected yields due to the low interest rate environment on top of front-loaded fees as well as the front loading of the cost of the term component. Now, as bonds are forecast to decrease in value in the near future, publicly traded carriers will likely be forced to sell bonds before maturity to make good on term life commitments and whole life minimum yield commitments to keep policies in force.
This means that insurers need a full profile of clients as they experience life changes like a move, loss of job, a promotion or birth. Such changes require the proper mitigation strategy, which can be employed to protect a baseline of coverage in order to maintain or improve the premium. This can range from splitting term from whole life to using managed investment portfolio yields to temporarily pad premium shortfalls.
Overall, without a true, timely and complete picture of a client and his/her personal and professional relationships over time and what strategies were presented, considered appealing and ultimately put in force, how will margins improve? Surely, social media data can help here but it should be a second step after mastering what is available in-house already. What are some of your experiences how carriers have tried to collect and use core customer data?