Category Archives: Governance, Risk and Compliance
If you build an IT Architecture, it will be a constant up-hill battle to get business users and executives engaged and take ownership of data governance and data quality. In short you will struggle to maximize the information potential in your enterprise. But if you develop and Enterprise Architecture that starts with a business and operational view, the dynamics change dramatically. To make this point, let’s take a look at a case study from Cisco. (more…)
Murphy’s First Law of Bad Data – If You Make A Small Change Without Involving Your Client – You Will Waste Heaps Of Money
I have not used my personal encounter with bad data management for over a year but a couple of weeks ago I was compelled to revive it. Why you ask? Well, a complete stranger started to receive one of my friend’s text messages – including mine – and it took days for him to detect it and a week later nobody at this North American wireless operator had been able to fix it. This coincided with a meeting I had with a European telco’s enterprise architecture team. There was no better way to illustrate to them how a customer reacts and the risk to their operations, when communication breaks down due to just one tiny thing changing – say, his address (or in the SMS case, some random SIM mapping – another type of address).
In my case, I moved about 250 miles within the United States a couple of years ago and this seemingly common experience triggered a plethora of communication screw ups across every merchant a residential household engages with frequently, e.g. your bank, your insurer, your wireless carrier, your average retail clothing store, etc.
For more than two full years after my move to a new state, the following things continued to pop up on a monthly basis due to my incorrect customer data:
- In case of my old satellite TV provider they got to me (correct person) but with a misspelled last name at my correct, new address.
- My bank put me in a bit of a pickle as they sent “important tax documentation”, which I did not want to open as my new tenants’ names (in the house I just vacated) was on the letter but with my new home’s address.
- My mortgage lender sends me a refinancing offer to my new address (right person & right address) but with my wife’s as well as my name completely butchered.
- My wife’s airline, where she enjoys the highest level of frequent flyer status, continually mails her offers duplicating her last name as her first name.
- A high-end furniture retailer sends two 100-page glossy catalogs probably costing $80 each to our address – one for me, one for her.
- A national health insurer sends “sensitive health information” (disclosed on envelope) to my new residence’s address but for the prior owner.
- My legacy operator turns on the wrong premium channels on half my set-top boxes.
- The same operator sends me a SMS the next day thanking me for switching to electronic billing as part of my move, which I did not sign up for, followed by payment notices (as I did not get my invoice in the mail). When I called this error out for the next three months by calling their contact center and indicating how much revenue I generate for them across all services, they counter with “sorry, we don’t have access to the wireless account data”, “you will see it change on the next bill cycle” and “you show as paper billing in our system today”.
Ignoring the potential for data privacy law suits, you start wondering how long you have to be a customer and how much money you need to spend with a merchant (and they need to waste) for them to take changes to your data more seriously. And this are not even merchants to whom I am brand new – these guys have known me and taken my money for years!
One thing I nearly forgot…these mailings all happened at least once a month on average, sometimes twice over 2 years. If I do some pigeon math here, I would have estimated the postage and production cost alone to run in the hundreds of dollars.
However, the most egregious trespass though belonged to my home owner’s insurance carrier (HOI), who was also my mortgage broker. They had a double whammy in store for me. First, I received a cancellation notice from the HOI for my old residence indicating they had cancelled my policy as the last payment was not received and that any claims will be denied as a consequence. Then, my new residence’s HOI advised they added my old home’s HOI to my account.
After wondering what I could have possibly done to trigger this, I called all four parties (not three as the mortgage firm did not share data with the insurance broker side – surprise, surprise) to find out what had happened.
It turns out that I had to explain and prove to all of them how one party’s data change during my move erroneously exposed me to liability. It felt like the old days, when seedy telco sales people needed only your name and phone number and associate it with some sort of promotion (back of a raffle card to win a new car), you never took part in, to switch your long distance carrier and present you with a $400 bill the coming month. Yes, that also happened to me…many years ago. Here again, the consumer had to do all the legwork when someone (not an automatic process!) switched some entry without any oversight or review triggering hours of wasted effort on their and my side.
We can argue all day long if these screw ups are due to bad processes or bad data, but in all reality, even processes are triggered from some sort of underlying event, which is something as mundane as a database field’s flag being updated when your last purchase puts you in a new marketing segment.
Now imagine you get married and you wife changes her name. With all these company internal (CRM, Billing, ERP), free public (property tax), commercial (credit bureaus, mailing lists) and social media data sources out there, you would think such everyday changes could get picked up quicker and automatically. If not automatically, then should there not be some sort of trigger to kick off a “governance” process; something along the lines of “email/call the customer if attribute X has changed” or “please log into your account and update your information – we heard you moved”. If American Express was able to detect ten years ago that someone purchased $500 worth of product with your credit card at a gas station or some lingerie website, known for fraudulent activity, why not your bank or insurer, who know even more about you? And yes, that happened to me as well.
Tell me about one of your “data-driven” horror scenarios?
Informatica announced, once again, that it is listed as a leader in the industry’s second Gartner Magic Quadrant for Data Masking Technology. With data security continuing to grow as one of the fastest segments in the enterprise software market, technologies such as data masking are becoming the solution of choice for data-centric security.
Increased fear of cyber-attacks and internal data breaches has made predictions that 2014 is the year of preventative and tactical measures to ensure corporate data assets are safe. Data masking should be included in those measures. According to Gartner,
“Security program managers need to take a strategic approach with tactical best-practice technology configurations in order to properly address the most common advanced targeted attack scenarios to increase both detection and prevention capabilities.”
Without these measures, the cost of an attack or breach is growing every year. The Ponemon Institute posted in a recent study:
“The 2013 Cost of Cyber Crime Study states that the average annualized cost of cybercrime incurred by a benchmark sample of US organizations was $11.56 million, nearly 78% more than the cost estimated in the first analysis conducted 4 years ago.”
Informatica believes that the best preventative measures include a layered approach for data security but without sacrificing agility or adding unnecessary costs. Data Masking delivers data-centric security with improved productivity and reduced overall costs.
Data Masking prevents internal data theft and abuse of sensitive data by hiding it from users. Data masking techniques include replacing some fields with similar-looking characters, masking characters (for example, “x”), substituting real last names with fictional last names and shuffling data within columns – to name a few. Other terms for data masking include data obfuscation, sanitization, scrambling, de-identification, and anonymization . Call it what you like, but without it – organizations may continue to expose sensitive data to those with mal intentions.
To learn more, Download the Gartner Magic Quadrant Data Masking Report now. And visit the Informatica website for data masking product information.
About the Magic Quadrant
Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings. Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose
As I continue to counsel insurers about master data, they all agree immediately that it is something they need to get their hands around fast. If you ask participants in a workshop at any carrier; no matter if life, p&c, health or excess, they all raise their hands when I ask, “Do you have broadband bundle at home for internet, voice and TV as well as wireless voice and data?”, followed by “Would you want your company to be the insurance version of this?”
Now let me be clear; while communication service providers offer very sophisticated bundles, they are also still grappling with a comprehensive view of a client across all services (data, voice, text, residential, business, international, TV, mobile, etc.) each of their touch points (website, call center, local store). They are also miles away of including any sort of meaningful network data (jitter, dropped calls, failed call setups, etc.)
Similarly, my insurance investigations typically touch most of the frontline consumer (business and personal) contact points including agencies, marketing (incl. CEM & VOC) and the service center. On all these we typically see a significant lack of productivity given that policy, billing, payments and claims systems are service line specific, while supporting functions from developing leads and underwriting to claims adjucation often handle more than one type of claim.
This lack of performance is worsened even more by the fact that campaigns have sub-optimal campaign response and conversion rates. As touchpoint-enabling CRM applications also suffer from a lack of complete or consistent contact preference information, interactions may violate local privacy regulations. In addition, service centers may capture leads only to log them into a black box AS400 policy system to disappear.
Here again we often hear that the fix could just happen by scrubbing data before it goes into the data warehouse. However, the data typically does not sync back to the source systems so any interaction with a client via chat, phone or face-to-face will not have real time, accurate information to execute a flawless transaction.
On the insurance IT side we also see enormous overhead; from scrubbing every database from source via staging to the analytical reporting environment every month or quarter to one-off clean up projects for the next acquired book-of-business. For a mid-sized, regional carrier (ca. $6B net premiums written) we find an average of $13.1 million in annual benefits from a central customer hub. This figure results in a ROI of between 600-900% depending on requirement complexity, distribution model, IT infrastructure and service lines. This number includes some baseline revenue improvements, productivity gains and cost avoidance as well as reduction.
On the health insurance side, my clients have complained about regional data sources contributing incomplete (often driven by local process & law) and incorrect data (name, address, etc.) to untrusted reports from membership, claims and sales data warehouses. This makes budgeting of such items like medical advice lines staffed by nurses, sales compensation planning and even identifying high-risk members (now driven by the Affordable Care Act) a true mission impossible, which makes the life of the pricing teams challenging.
Over in the life insurers category, whole and universal life plans now encounter a situation where high value clients first faced lower than expected yields due to the low interest rate environment on top of front-loaded fees as well as the front loading of the cost of the term component. Now, as bonds are forecast to decrease in value in the near future, publicly traded carriers will likely be forced to sell bonds before maturity to make good on term life commitments and whole life minimum yield commitments to keep policies in force.
This means that insurers need a full profile of clients as they experience life changes like a move, loss of job, a promotion or birth. Such changes require the proper mitigation strategy, which can be employed to protect a baseline of coverage in order to maintain or improve the premium. This can range from splitting term from whole life to using managed investment portfolio yields to temporarily pad premium shortfalls.
Overall, without a true, timely and complete picture of a client and his/her personal and professional relationships over time and what strategies were presented, considered appealing and ultimately put in force, how will margins improve? Surely, social media data can help here but it should be a second step after mastering what is available in-house already. What are some of your experiences how carriers have tried to collect and use core customer data?
Recommendations and illustrations contained in this post are estimates only and are based entirely upon information provided by the prospective customer and on our observations. While we believe our recommendations and estimates to be sound, the degree of success achieved by the prospective customer is dependent upon a variety of factors, many of which are not under Informatica’s control and nothing in this post shall be relied upon as representative of the degree of success that may, in fact, be realized and no warrantee or representation of success, either express or implied, is made.
That tag line got your attention – did it not? Last week I talked about how companies are trying to squeeze more value out of their asset data (e.g. equipment of any kind) and the systems that house it. I also highlighted the fact that IT departments in many companies with physical asset-heavy business models have tried (and often failed) to create a consistent view of asset data in a new ERP or data warehouse application. These environments are neither equipped to deal with all life cycle aspects of asset information, nor are they fixing the root of the data problem in the sources, i.e. where the stuff is and what it look like. It is like a teenager whose parents have spent thousands of dollars on buying him the latest garments but he always wears the same three outfits because he cannot find the other ones in the pile he hoardes under her bed. And now they bought him a smart phone to fix it. So before you buy him the next black designer shirt, maybe it would be good to find out how many of the same designer shirts he already has, what state they are in and where they are.
Recently, I had the chance to work on a like problem with a large overseas oil & gas company and a North American utility. Both are by definition asset heavy, very conservative in their business practices, highly regulated, very much dependent on outside market forces such as the oil price and geographically very dispersed; and thus, by default a classic system integration spaghetti dish.
My challenge was to find out where the biggest opportunities were in terms of harnessing data for financial benefit.
The initial sense in oil & gas was that most of the financial opportunity hidden in asset data was in G&G (geophysical & geological) and the least on the retail side (lubricants and gas for sale at operated gas stations). On the utility side, the go to area for opportunity appeared to be maintenance operations. Let’s say that I was about right with these assertions but that there were a lot more skeletons in the closet with diamond rings on their fingers than I anticipated.
After talking extensively with a number of department heads in the oil company; starting with the IT folks running half of the 400 G&G applications, the ERP instances (turns out there were 5, not 1) and the data warehouses (3), I queried the people in charge of lubricant and crude plant operations, hydrocarbon trading, finance (tax, insurance, treasury) as well as supply chain, production management, land management and HSE (health, safety, environmental).
The net-net was that the production management people said that there is no issue as they already cleaned up the ERP instance around customer and asset (well) information. The supply chain folks also indicated that they have used another vendor’s MDM application to clean up their vendor data, which funnily enough was not put back into the procurement system responsible for ordering parts. The data warehouse/BI team was comfortable that they cleaned up any information for supply chain, production and finance reports before dimension and fact tables were populated for any data marts.
All of this was pretty much a series of denial sessions on your 12-step road to recovery as the IT folks had very little interaction with the business to get any sense of how relevant, correct, timely and useful these actions are for the end consumer of the information. They also had to run and adjust fixes every month or quarter as source systems changed, new legislation dictated adjustments and new executive guidelines were announced.
While every department tried to run semi-automated and monthly clean up jobs with scripts and some off-the-shelve software to fix their particular situation, the corporate (holding) company and any downstream consumers had no consistency to make sensible decisions on where and how to invest without throwing another legion of bodies (by now over 100 FTEs in total) at the same problem.
So at every stage of the data flow from sources to the ERP to the operational BI and lastly the finance BI environment, people repeated the same tasks: profile, understand, move, aggregate, enrich, format and load.
Despite the departmental clean-up efforts, areas like production operations did not know with certainty (even after their clean up) how many well heads and bores they had, where they were downhole and who changed a characteristic as mundane as the well name last and why (governance, location match).
Marketing (Trading) was surprisingly open about their issues. They could not process incoming, anchored crude shipments into inventory or assess who the counterparty they sold to was owned by and what payment terms were appropriate given the credit or concentration risk associated (reference data, hierarchy mgmt.). As a consequence, operating cash accuracy was low despite ongoing improvements in the process and thus, incurred opportunity cost.
Operational assets like rig equipment had excess insurance coverage (location, operational data linkage) and fines paid to local governments for incorrectly filing or not renewing work visas was not returned for up to two years incurring opportunity cost (employee reference data).
A big chunk of savings was locked up in unplanned NPT (non-production time) because inconsistent, incorrect well data triggered incorrect maintenance intervals. Similarly, OEM specific DCS (drill control system) component software was lacking a central reference data store, which did not trigger alerts before components failed. If you add on top a lack of linkage of data served by thousands of sensors via well logs and Pi historians and their ever changing roll-up for operations and finance, the resulting chaos is complete.
One approach we employed around NPT improvements was to take the revenue from production figure from their 10k and combine it with the industry benchmark related to number of NPT days per 100 day of production (typically about 30% across avg depth on & offshore types). Then you overlay it with a benchmark (if they don’t know) how many of these NPT days were due to bad data, not equipment failure or alike, and just fix a portion of that, you are getting big numbers.
When I sat back and looked at all the potential it came to more than $200 million in savings over 5 years and this before any sensor data from rig equipment, like the myriad of siloed applications running within a drill control system, are integrated and leveraged via a Hadoop cluster to influence operational decisions like drill string configuration or asmyth.
Next time I’ll share some insight into the results of my most recent utility engagement but I would love to hear from you what your experience is in these two or other similar industries.
Recommendations contained in this post are estimates only and are based entirely upon information provided by the prospective customer and on our observations. While we believe our recommendations and estimates to be sound, the degree of success achieved by the prospective customer is dependent upon a variety of factors, many of which are not under Informatica’s control and nothing in this post shall be relied upon as representative of the degree of success that may, in fact, be realized and no warrantee or representation of success, either express or implied, is made.
I believe that most in the software business believe that it is tough enough to calculate and hence financially justify the purchase or build of an application - especially middleware – to a business leader or even a CIO. Most of business-centric IT initiatives involve improving processes (order, billing, service) and visualization (scorecarding, trending) for end users to be more efficient in engaging accounts. Some of these have actually migrated to targeting improvements towards customers rather than their logical placeholders like accounts. Similar strides have been made in the realm of other party-type (vendor, employee) as well as product data. They also tackle analyzing larger or smaller data sets and providing a visual set of clues on how to interpret historical or predictive trends on orders, bills, usage, clicks, conversions, etc.
If you think this is a tough enough proposition in itself, imagine the challenge of quantifying the financial benefit derived from understanding where your “hardware” is physically located, how it is configured, who maintained it, when and how. Depending on the business model you may even have to figure out who built it or owns it. All of this has bottom-line effects on how, who and when expenses are paid and revenues get realized and recognized. And then there is the added complication that these dimensions of hardware are often fairly dynamic as they can also change ownership and/or physical location and hence, tax treatment, insurance risk, etc.
Such hardware could be a pump, a valve, a compressor, a substation, a cell tower, a truck or components within these assets. Over time, with new technologies and acquisitions coming about, the systems that plan for, install and maintain these assets become very departmentalized in terms of scope and specialized in terms of function. The same application that designs an asset for department A or region B, is not the same as the one accounting for its value, which is not the same as the one reading its operational status, which is not the one scheduling maintenance, which is not the same as the one billing for any repairs or replacement. The same folks who said the Data Warehouse is the “Golden Copy” now say the “new ERP system” is the new central source for everything. Practitioners know that this is either naiveté or maliciousness. And then there are manual adjustments….
Moreover, to truly take squeeze value out of these assets being installed and upgraded, the massive amounts of data they generate in a myriad of formats and intervals need to be understood, moved, formatted, fixed, interpreted at the right time and stored for future use in a cost-sensitive, easy-to-access and contextual meaningful way.
I wish I could tell you one application does it all but the unsurprising reality is that it takes a concoction of multiple. None or very few asset life cycle-supporting legacy applications will be retired as they often house data in formats commensurate with the age of the assets they were built for. It makes little financial sense to shut down these systems in a big bang approach but rather migrate region after region and process after process to the new system. After all, some of the assets have been in service for 50 or more years and the institutional knowledge tied to them is becoming nearly as old. Also, it is probably easier to engage in often required manual data fixes (hopefully only outliers) bit-by-bit, especially to accommodate imminent audits.
So what do you do in the meantime until all the relevant data is in a single system to get an enterprise-level way to fix your asset tower of Babel and leverage the data volume rather than treat it like an unwanted step child? Most companies, which operate in asset, fixed-cost heavy business models do not want to create a disruption but a steady tuning effect (squeezing the data orange), something rather unsexy in this internet day and age. This is especially true in “older” industries where data is still considered a necessary evil, not an opportunity ready to exploit. Fact is though; that in order to improve the bottom line, we better get going, even if it is with baby steps.
If you are aware of business models and their difficulties to leverage data, write to me. If you even know about an annoying, peculiar or esoteric data “domain”, which does not lend itself to be easily leveraged, share your thoughts. Next time, I will share some examples on how certain industries try to work in this environment, what they envision and how they go about getting there.
A data integration hub is a proven vehicle to provide a self service model for publishing and subscribing data to be made available to a variety of users. For those who deploy these environments for regulated and sensitive data need to think of data privacy and data governance during the design phase of the project.
In the data integration hub architecture, think about how sensitive data will be coming from different locations, from a variety of technology platforms, and certainly from systems being managed by teams with a wide range of data security skills. How can you ensure data will be protected across such a heterogeneous environment? Not to mention if data traverses across national boundaries.
Then think about testing connectivity. If data needs to be validated in a data quality rules engine, in order to truly test this connectivity, there needs to be a capability to test using valid data. However testers should not have access or visibility into the actual data itself if it is classified as sensitive or confidential.
With a hub and spoke model, the rules are difficult to enforce if data is being requested from one country and received in another. The opportunity for exposing human error and potential data leakage increases exponentially. Rather than reading about a breach in the headlines, it may make sense to look at building preventative measures or spending the time and money to do the right thing from the onset of the project.
There are technologies that exist in the market that are easy to implement that are designed to prevent this very type of exposure. This technology is called data masking which includes data obfuscation, encryption and tokenization. Informatica’s Data Privacy solution based on persistent and dynamic data masking options can be easily and quickly deployed without the need to develop code or modify the source or target application.
When developing your reference architecture for a data integration hub, incorporate sound data governance policies and build data privacy into the application upfront. Don’t wait for the headlines to include your company and someone’s personal data.
In the Information Age we live and work in, where it’s hard to go even one day without a Google search, where do you turn for insights that can help you solve work challenges and progress your career? This is a tough question. How can we deal with the challenges of information overload – which some have called information pollution? (more…)
What is In-Database Archiving in Oracle 12c and Why You Still Need a Database Archiving Solution to Complement It (Part 2)
In my last blog on this topic, I discussed several areas where a database archiving solution can complement or help you to better leverage the Oracle In-Database Archiving feature. For an introduction of what the new In-Database Archiving feature in Oracle 12c is, refer to Part 1 of my blog on this topic.
Here, I will discuss additional areas where a database archiving solution can complement the new Oracle In-Database Archiving feature:
- Graphical UI for ease of administration – In database archiving is currently a technical feature of Oracle database, and not easily visible or mange-able outside of the DBA persona. This is where a database archiving solution provides a more comprehensive set of graphical user interfaces (GUI) that makes this feature easier to monitor and manage.
- Enabling application of In-Database Archiving for packaged applications and complex data models – Concepts of business entities or transactional records composed of related tables to maintain data and referential integrity as you archive, move, purge, and retain data, as well as business rules to determine when data has become inactive and can therefore be safely archived allow DBAs to apply this new Oracle feature to more complex data models. Also, the availability of application accelerators (prebuilt metadata of business entities and business rules for packaged applications) enables the application of In-Database Archiving to packaged applications like Oracle E-Business Suite, PeopleSoft, Siebel, and JD Edwards
What is In-Database Archiving in Oracle 12c and Why You Still Need a Database Archiving Solution to Complement It (Part 1)
What is the new In-Database Archiving in the latest Oracle 12c release?
On June 25, 2013, Oracle introduced a new feature called In-Database Archiving with its new release of Oracle 12. “In-Database Archiving enables you to archive rows within a table by marking them as inactive. These inactive rows are in the database and can be optimized using compression, but are not visible to an application. The data in these rows is available for compliance purposes if needed by setting a session parameter. With In-Database Archiving you can store more data for a longer period of time within a single database, without compromising application performance. Archived data can be compressed to help improve backup performance, and updates to archived data can be deferred during application upgrades to improve the performance of upgrades.”
This is an Oracle specific feature and does not apply to other databases.