Category Archives: Data Warehousing
This creative thinking to solve a problem came from a request to build a soldier knife from the Swiss Army. In the end, the solution was all about getting the right tool for the right job in the right place. In many cases soldiers didn’t need industrial strength tools, all they really needed was a compact and lightweight tool to get the job at hand done quickly.
Putting this into perspective with today’s world of Data Integration, using enterprise-class data integration tools for the smaller data integration project is over kill and typically out of reach for the smaller organization. However, these smaller data integration projects are just as important as those larger enterprise projects, and they are often the innovation behind a new way of business thinking. The traditional hand-coding approach to addressing the smaller data integration project is not-scalable, not-repeatable and prone to human error, what’s needed is a compact, flexible and powerful off-the-shelf tool.
Thankfully, over a century after the world embraced the Swiss Army Knife, someone at Informatica was paying attention to revolutionary ideas. If you’ve not yet heard the news about the Informatica platform, a version called PowerCenter Express has been released and it is free of charge so you can use it to handle an assortment of what I’d characterize as high complexity / low volume data integration challenges and experience a subset of the Informatica platform for yourself. I’d emphasize that PowerCenter Express doesn’t replace the need for Informatica’s enterprise grade products, but it is ideal for rapid prototyping, profiling data, and developing quick proof of concepts.
PowerCenter Express provides a glimpse of the evolving Informatica platform by integrating four Informatica products into a single, compact tool. There are no database dependencies and the product installs in just under 10 minutes. Much to my own surprise, I use PowerCenter express quite often going about the various aspects of my job with Informatica. I have it installed on my laptop so it travels with me wherever I go. It starts up quickly so it’s ideal for getting a little work done on an airplane.
For example, recently I wanted to explore building some rules for an upcoming proof of concept on a plane ride home so I could claw back some personal time for my weekend. I used PowerCenter Express to profile some data and create a mapping. And this mapping wasn’t something I needed to throw away and recreate in an enterprise version after my flight landed. Vibe, Informatica’s build once / run anywhere metadata driven architecture allows me to export a mapping I create in PowerCenter Express to one of the enterprise versions of Informatica’s products such as PowerCenter, DataQuality or Informatica Cloud.
As I alluded to earlier in this article, being a free offering I honestly didn’t expect too much from PowerCenter Express when I first started exploring it. However, due to my own positive experiences, I now like to think of PowerCenter Express as the Swiss Army Knife of Data Integration.
To start claiming back some of your personal time, get started with the free version of PowerCenter Express, found on the Informatica Marketplace at: https://community.informatica.com/solutions/pcexpress
Getting started with Cloud Data Warehousing using Amazon Redshift is now easier than ever, thanks to the Informatica Cloud’s 60-day trial for Amazon Redshift. Now, anyone can easily and quickly move data from any on-premise, cloud, Big Data, or relational data sources into Amazon Redshift without writing a single line of code and without being a data integration expert. You can use Informatica Cloud’s six-step wizard to quickly replicate your data or use the productivity-enhancing cloud integration designer to tackle more advanced use cases, such as combining multiple data sources into one Amazon Redshift table. Existing Informatica PowerCenter users can use Informatica Cloud and Amazon Redshift to extend an existing data warehouse with through an affordable and scalable approach. If you are currently exploring self-service business intelligence solutions such as Birst, Tableau, or Microstrategy, the combination of Redshift and Informatica Cloud makes it incredibly easy to prepare the data for analytics by any BI solution.
To get started, execute the following steps:
- Go to http://informaticacloud.com/cloud-trial-for-redshift and click on the ‘Sign Up Now’ link
- You’ll be taken to the Informatica Marketplace listing for the Amazon Redshift trial. Sign up for a Marketplace account if you don’t already have one, and then click on the ‘Start Free Trial Now’ button
- You’ll then be prompted to login with your Informatica Cloud account. If you do not have an Informatica Cloud username and password, register one by clicking the appropriate link and fill in the required details
- Once you finish registration and obtain your login details, download the Vibe ™ Secure Agent to your Amazon EC2 virtual machine (or to a local Windows or Linux instance), and ensure that it can access your Amazon S3 bucket and Amazon Redshift cluster.
- Ensure that your S3 bucket, and Redshift cluster are both in the same availability zone
- To start using the Informatica Cloud connector for Amazon Redshift, create a connection to your Amazon Redshift nodes by providing your AWS Access Key ID and Secret Access Key, specifying your cluster details, and obtaining your JDBC URL string.
You are now ready to begin moving data to and from Amazon Redshift by creating your first Data Synchronization task (available under Applications). Pick a source, pick your Redshift target, map the fields, and you’re done!
The value of using Informatica Cloud to load data into Amazon Redshift is the ability of the application to move massive amounts of data in parallel. The Informatica engine optimizes by moving processing close to where the data is using push-down technology. Unlike other data integration solutions for Redshift that perform batch processing using an XML engine which is inherently slow when processing large data volumes and don’t have multitenant architectures that scale well, Informatica Cloud processes over 2 billion transactions every day.
Amazon Redshift has brought agility, scalability, and affordability to petabyte-scale data warehousing, and Informatica Cloud has made it easy to transfer all your structured and unstructured data into Redshift so you can focus on getting data insights today, not weeks from now.
It is troublesome to me to repeatedly get into conversations with IT managers who want to fix data “for the sake of fixing it”. While this is presumably increasingly rare, due to my department’s role, we probably see a higher occurrence than the normal software vendor employee. Given that, please excuse the inflammatory title of this post.
Nevertheless, once the deal is done, we find increasingly fewer of these instances, yet still enough, as the average implementation consultant or developer cares about this aspect even less. A few months ago a petrochemical firm’s G&G IT team lead told me that he does not believe that data quality improvements can or should be measured. He also said, “if we need another application, we buy it. End of story.” Good for software vendors, I thought, but in most organizations $1M here or there do not lay around leisurely plus decision makers want to see the – dare I say it – ROI.
However, IT and business leaders should take note that a misalignment due to lack OR disregard of communication is a critical success factor. If the business does not get what it needs and wants AND it differs what Corporate IT is envisioning and working on – and this is what I am talking about here – it makes any IT investment a risky proposition.
Let me illustrate this with 4 recent examples I ran into:
1. Potential for flawed prioritization
A retail customer’s IT department apparently knew that fixing and enriching a customer loyalty record across the enterprise is a good and financially rewarding idea. They only wanted to understand what the less-risky functional implementation choices where. They indicated that if they wanted to learn what the factual financial impact of “fixing” certain records or attributes, they would just have to look into their enterprise data warehouse. This is where the logic falls apart as the warehouse would be just as unreliable as the “compromised” applications (POS, mktg, ERP) feeding it.
Even if they massaged the data before it hit the next EDW load, there is nothing inherently real-time about this as all OLTP are running processes of incorrect (no bidirectional linkage) and stale data (since the last load).
I would question if the business is now completely aligned with what IT is continuously correcting. After all, IT may go for the “easy or obvious” fixes via a weekly or monthly recurring data scrub exercise without truly knowing, which the “biggest bang for the buck” is or what the other affected business use cases are, they may not even be aware of yet. Imagine the productivity impact of all the roundtripping and delay in reporting this creates. This example also reminds me of a telco client, I encountered during my tenure at another tech firm, which fed their customer master from their EDW and now just found out that this pattern is doomed to fail due to data staleness and performance.
2. Fix IT issues and business benefits will trickle down
Client number two is a large North American construction Company. An architect built a business case for fixing a variety of data buckets in the organization (CRM, Brand Management, Partner Onboarding, Mobility Services, Quotation & Requisitions, BI & EPM).
Grand vision documents existed and linked to the case, which stated how data would get better (like a sick patient) but there was no mention of hard facts of how each of the use cases would deliver on this. After I gave him some major counseling what to look out and how to flesh it out – radio silence. Someone got scared of the math, I guess.
3. Now that we bought it, where do we start
The third culprit was a large petrochemical firm, which apparently sat on some excess funds and thought (rightfully so) it was a good idea to fix their well attributes. More power to them. However, the IT team is now in a dreadful position having to justify to their boss and ultimately the E&P division head why they prioritized this effort so highly and spent the money. Well, they had their heart in the right place but are a tad late. Still, I consider this better late than never.
4. A senior moment
The last example comes from a South American communications provider. They seemingly did everything right given the results they achieved to date. This gets to show that misalignment of IT and business does not necessarily wreak havoc – at least initially.
However, they are now in phase 3 of their roll out and reality caught up with them. A senior moment or lapse in judgment maybe? Whatever it was; once they fixed their CRM, network and billing application data, they had to start talking to the business and financial analysts as complaints and questions started to trickle in. Once again, better late than never.
So what is the take-away from these stories. Why wait until phase 3, why have to be forced to cram some justification after the purchase? You pick, which one works best for you to fix this age-old issue. But please heed Sohaib’s words of wisdom recently broadcast on CNN Money “IT is a mature sector post bubble…..now it needs to deliver the goods”. And here is an action item for you – check out the new way for the business user to prepare their own data (30 minutes into the video!). Agreed?
“If I had my way, I’d fire the statisticians – all of them – they don’t add value”.
Surely not? Why would you fire the very people who were employed to make sense of the vast volumes of manufacturing data and guide future production? But he was right. The problem was at that time data management was so poor that data was simply not available for the statisticians to analyze.
So, perhaps this title should be re-written to be:
Fire your Data Scientists – They Aren’t Able to Add Value.
Although this statement is a bit extreme, the same situation may still exist. Data scientists frequently share frustrations such as:
- “I’m told our data is 60% accurate, which means I can’t trust any of it.”
- “We achieved our goal of an answer within a week by working 24 hours a day.”
- “Each quarter we manually prepare 300 slides to anticipate all questions the CFO may ask.”
- “Fred manually audits 10% of the invoices. When he is on holiday, we just don’t do the audit.”
This is why I think the original quote is so insightful. Value from data is not automatically delivered by hiring a statistician, analyst or data scientist. Even with the latest data mining technology, one person cannot positively influence a business without the proper data to support them.
Most organizations are unfamiliar with the structure required to deliver value from their data. New storage technologies will be introduced and a variety of analytics tools will be tried and tested. This change is crucial for to success. In order for statisticians to add value to a company, they must have access to high quality data that is easily sourced and integrated. That data must be available through the latest analytics technology. This new ecosystem should provide insights that can play a role in future production. Staff will need to be trained, as this new data will be incorporated into daily decision making.
With a rich 20-year history, Informatica understands data ecosystems. Employees become wasted investments when they do not have access to the trusted data they need in order to deliver their true value.
Who wants to spend their time recreating data sets to find a nugget of value only to discover it can’t be implemented?
Build a analytical ecosystem with a balanced focus on all aspects of data management. This will mean that value delivery is limited only by the imagination of your employees. Rather than questioning the value of an analytics team, you will attract some of the best and the brightest. Then, you will finally be able to deliver on the promised value of your data.
The transition to value-based care is well underway. From healthcare delivery organizations to clinicians, payers, and patients, everyone feels the impact. Each has a role to play. Moving to a value-driven model demands agility from people, processes, and technology. Organizations that succeed in this transformation will be those in which:
- Collaboration is commonplace
- Clinicians and business leaders wear new hats
- Data is recognized as an enterprise asset
The ability to leverage data will differentiate the leaders from the followers. Successful healthcare organizations will:
1) Establish analytics as a core competency
2) Rely on data to deliver best practice care
3) Engage patients and collaborate across the ecosystem to foster strong, actionable relationships
Trustworthy data is required to power the analytics that reveal the right answers, to define best practice guidelines and to identify and understand relationships across the ecosystem. In order to advance, data integration must also be agile. The right answers do not live in a single application. Instead, the right answers are revealed by integrating data from across the entire ecosystem. For example, in order to deliver personalized medicine, you must analyze an integrated view of data from numerous sources. These sources could include multiple EMRs, genomic data, data marts, reference data and billing data.
A recent PWC survey showed that 62% of executives believe data integration will become a competitive advantage. However, a July 2013 Information Week survey reported that 40% of healthcare executives gave their organization only a grade D or F on preparedness to manage the data deluge.
What grade would you give your organization?
You can improve your organization’s grade, but it will require collaboration between business and IT. If you are in IT, you’ll need to collaborate with business users who understand the data. You must empower them with self-service tools for improving data quality and connecting data. If you are a business leader, you need to understand and take an active role with the data.
To take the next step, download our new eBook, “Potential Unlocked: Transforming healthcare by putting information to work.” In it, you’ll learn:
- How to put your information to work
- New ways to govern your data
- What other healthcare organizations are doing
- How to overcome common barriers
So go ahead, download it now and let me know what you think. I look forward to hearing your questions and comments….oh, and your grade!
Maybe the word “death” is a bit strong, so let’s say “demise” instead. Recently I read an article in the Harvard Business Review around how Big Data and Data Scientists will rule the world of the 21st century corporation and how they have to operate for maximum value. The thing I found rather disturbing was that it takes a PhD – probably a few of them – in a variety of math areas to give executives the necessary insight to make better decisions ranging from what product to develop next to who to sell it to and where.
Don’t get me wrong – this is mixed news for any enterprise software firm helping businesses locate, acquire, contextually link, understand and distribute high-quality data. The existence of such a high-value role validates product development but it also limits adoption. It is also great news that data has finally gathered the attention it deserves. But I am starting to ask myself why it always takes individuals with a “one-in-a-million” skill set to add value. What happened to the democratization of software? Why is the design starting point for enterprise software not always similar to B2C applications, like an iPhone app, i.e. simpler is better? Why is it always such a gradual “Cold War” evolution instead of a near-instant French Revolution?
Why do development environments for Big Data not accommodate limited or existing skills but always accommodate the most complex scenarios? Well, the answer could be that the first customers will be very large, very complex organizations with super complex problems, which they were unable to solve so far. If analytical apps have become a self-service proposition for business users, data integration should be as well. So why does access to a lot of fast moving and diverse data require scarce PIG or Cassandra developers to get the data into an analyzable shape and a PhD to query and interpret patterns?
I realize new technologies start with a foundation and as they spread supply will attempt to catch up to create an equilibrium. However, this is about a problem, which has existed for decades in many industries, such as the oil & gas, telecommunication, public and retail sector. Whenever I talk to architects and business leaders in these industries, they chuckle at “Big Data” and tell me “yes, we got that – and by the way, we have been dealing with this reality for a long time”. By now I would have expected that the skill (cost) side of turning data into a meaningful insight would have been driven down more significantly.
Informatica has made a tremendous push in this regard with its “Map Once, Deploy Anywhere” paradigm. I cannot wait to see what’s next – and I just saw something recently that got me very excited. Why you ask? Because at some point I would like to have at least a business-super user pummel terabytes of transaction and interaction data into an environment (Hadoop cluster, in memory DB…) and massage it so that his self-created dashboard gets him/her where (s)he needs to go. This should include concepts like; “where is the data I need for this insight?’, “what is missing and how do I get to that piece in the best way?”, “how do I want it to look to share it?” All that is required should be a semi-experienced knowledge of Excel and PowerPoint to get your hands on advanced Big Data analytics. Don’t you think? Do you believe that this role will disappear as quickly as it has surfaced?
As I continue to counsel insurers about master data, they all agree immediately that it is something they need to get their hands around fast. If you ask participants in a workshop at any carrier; no matter if life, p&c, health or excess, they all raise their hands when I ask, “Do you have broadband bundle at home for internet, voice and TV as well as wireless voice and data?”, followed by “Would you want your company to be the insurance version of this?”
Now let me be clear; while communication service providers offer very sophisticated bundles, they are also still grappling with a comprehensive view of a client across all services (data, voice, text, residential, business, international, TV, mobile, etc.) each of their touch points (website, call center, local store). They are also miles away of including any sort of meaningful network data (jitter, dropped calls, failed call setups, etc.)
Similarly, my insurance investigations typically touch most of the frontline consumer (business and personal) contact points including agencies, marketing (incl. CEM & VOC) and the service center. On all these we typically see a significant lack of productivity given that policy, billing, payments and claims systems are service line specific, while supporting functions from developing leads and underwriting to claims adjucation often handle more than one type of claim.
This lack of performance is worsened even more by the fact that campaigns have sub-optimal campaign response and conversion rates. As touchpoint-enabling CRM applications also suffer from a lack of complete or consistent contact preference information, interactions may violate local privacy regulations. In addition, service centers may capture leads only to log them into a black box AS400 policy system to disappear.
Here again we often hear that the fix could just happen by scrubbing data before it goes into the data warehouse. However, the data typically does not sync back to the source systems so any interaction with a client via chat, phone or face-to-face will not have real time, accurate information to execute a flawless transaction.
On the insurance IT side we also see enormous overhead; from scrubbing every database from source via staging to the analytical reporting environment every month or quarter to one-off clean up projects for the next acquired book-of-business. For a mid-sized, regional carrier (ca. $6B net premiums written) we find an average of $13.1 million in annual benefits from a central customer hub. This figure results in a ROI of between 600-900% depending on requirement complexity, distribution model, IT infrastructure and service lines. This number includes some baseline revenue improvements, productivity gains and cost avoidance as well as reduction.
On the health insurance side, my clients have complained about regional data sources contributing incomplete (often driven by local process & law) and incorrect data (name, address, etc.) to untrusted reports from membership, claims and sales data warehouses. This makes budgeting of such items like medical advice lines staffed by nurses, sales compensation planning and even identifying high-risk members (now driven by the Affordable Care Act) a true mission impossible, which makes the life of the pricing teams challenging.
Over in the life insurers category, whole and universal life plans now encounter a situation where high value clients first faced lower than expected yields due to the low interest rate environment on top of front-loaded fees as well as the front loading of the cost of the term component. Now, as bonds are forecast to decrease in value in the near future, publicly traded carriers will likely be forced to sell bonds before maturity to make good on term life commitments and whole life minimum yield commitments to keep policies in force.
This means that insurers need a full profile of clients as they experience life changes like a move, loss of job, a promotion or birth. Such changes require the proper mitigation strategy, which can be employed to protect a baseline of coverage in order to maintain or improve the premium. This can range from splitting term from whole life to using managed investment portfolio yields to temporarily pad premium shortfalls.
Overall, without a true, timely and complete picture of a client and his/her personal and professional relationships over time and what strategies were presented, considered appealing and ultimately put in force, how will margins improve? Surely, social media data can help here but it should be a second step after mastering what is available in-house already. What are some of your experiences how carriers have tried to collect and use core customer data?
Recommendations and illustrations contained in this post are estimates only and are based entirely upon information provided by the prospective customer and on our observations. While we believe our recommendations and estimates to be sound, the degree of success achieved by the prospective customer is dependent upon a variety of factors, many of which are not under Informatica’s control and nothing in this post shall be relied upon as representative of the degree of success that may, in fact, be realized and no warrantee or representation of success, either express or implied, is made.
Unlike some of my friends, History was a subject in high school and college that I truly enjoyed. I particularly appreciated biographies of favorite historical figures because it painted a human face and gave meaning and color to the past. I also vowed at that time to navigate my life and future under the principle attributed to Harvard professor Jorge Agustín Nicolás Ruiz de Santayana y Borrás that goes, “Those who cannot remember the past are condemned to repeat it.”
So that’s a little ditty regarding my history regarding history.
Forwarding now to the present in which I have carved out my career in technology, and in particular, enterprise software, I’m afforded a great platform where I talk to lots of IT and business leaders. When I do, I usually ask them, “How are you implementing advanced projects that help the business become more agile or effective or opportunistically proactive?” They usually answer something along the lines of “this is the age and renaissance of data science and analytics” and then end up talking exclusively about their meat and potatoes business intelligence software projects and how 300 reports now run their business.
Then when I probe and hear their answer more in depth, I am once again reminded of THE history quote and think to myself there’s an amusing irony at play here. When I think about the Business Intelligence systems of today, most are designed to “remember” and report on the historical past through large data warehouses of a gazillion transactions, along with basic, but numerous shipping and billing histories and maybe assorted support records.
But when it comes right down to it, business intelligence “history” is still just that. Nothing is really learned and applied right when and where it counted – AND when it would have made all the difference had the company been able to react in time.
So, in essence, by using standalone BI systems as they are designed today, companies are indeed condemned to repeat what they have already learned because they are too late – so the same mistakes will be repeated again and again.
This means the challenge for BI is to reduce latency, measure the pertinent data / sensors / events, and get scalable – extremely scalable and flexible enough to handle the volume and variety of the forthcoming data onslaught.
There’s a part 2 to this story so keep an eye out for my next blog post History Repeats Itself (Part 2)
Data is everywhere. It’s in databases and applications spread across your enterprise. It’s in the hands of your customers and partners. It’s in cloud applications and cloud servers. It’s on spreadsheets and documents on your employee’s laptops and tablets. It’s in smartphones, sensors and GPS devices. It’s in the blogosphere, the twittersphere and your friends’ Facebook timelines. (more…)