Category Archives: Data Governance
Today, I am going to take a stab at rationalizing why one could even consider solving a problem with a solution that is well-known to be sub-par. Consider the Ford Pinto: Would you choose this car for your personal, land-based transportation simply because of the new plush dice in the window? For my European readers, replace the Pinto with the infamous Trabant and you get my meaning. The fact is, both of these vehicles made the list of the “worst cars ever built” due to their mediocre design, environmental hazards or plain personal safety record.
Rational people would never choose a vehicle this way. So I always ask myself, “How can IT organizations rationalize buying product X just because product Y is thrown in for free?” Consider the case in which an organization chooses their CRM or BPM system simply because the vendor throws in an MDM or Data Quality Solution for free: Can this be done with a straight face? You often hear vendors claim that “everything in our house is pre-integrated”, “plug & play” or “we have accelerators for this.” I would hope that IT procurement officers have come to understand that these phrases don’t close a deal in a cloud-based environment. An on-premise construct can never achieve this Nirvana unless it is customized based on client requirements.
Anyone can see the logic in getting “2 for the price of 1.” However, as IT procurement organizations seek to save a percentage of money every deal, they can’t lose sight of this key fact:
Standing up software (configuring, customizing, maintaining) and operating it over several years requires CLOSE inspection and scrutiny.
Like a Ford Pinto, Software cannot just be driven off the lot without a care, leaving you only to worry about changing the oil and filters at recommended intervals. Customization, operational risk and maintenance are a significant cost, which all my seasoned padawans will know. If Pinto buyers would have understood the Total Cost of Ownership before they made their purchase, they would have opted for Toyotas instead. Here is the bottom line:
If less than 10% of the overall requirements are solved by the free component
AND (and this is a big AND)
If less than 12% of the overall financial value is provided by the free component
Then it makes ZERO sense select a solution based on freebie add-ons.
When an add-on component is of significantly lower-quality than industry leading solutions, it becomes even more illogical to rely on it simply because it’s “free.” If analysts have affirmed that the leading solutions have stronger capabilities, flexibility and scalability, what does an IT department truly “save” by choosing an inferior “free” add-on?
So just why DO procurement officers to gravitate toward “free” add-ons, rather than high quality solutions? As a former procurement manager, I remember the motivations perfectly. Procurement teams are often measured by, and rewarded for, the savings they achieve. Because their motivation is near-term savings, long term quality issues are not the primary decision driver. And, if IT fails to successfully communicate the risks, cost drivers and potential failure rates to Procurement, the motivation to save up-front money will win every time.
Both sellers and buyers need to avoid these dances of self-deception, the “Pre-Integration Tango” and the “Freebie Cha-Cha”. No matter how much you loved driving that Pinto or Trabant off the dealer lot, your opinion changed after you drove it for 50,000 miles.
I’ve been in procurement. I’ve built, sold and implemented “accelerators” and “blueprints.” In my opinion, 2-for-1 is usually a bad idea in software procurement. The best software is designed to make 1+1=3. I would love to hear from you if you agree with my above “10% requirements/12% value” rule-of-thumb. If not, let me know what your decision logic would be.
People are obsessed with data. Data captured from our smartphones. Internet data showing how we shop and search — and what marketers do with that data. Big Data, which I loosely define as people throwing every conceivable data point into a giant Hadoop cluster with the hope of figuring out what it all means.
Too bad all that attention stems from fear, uncertainty and doubt about the data that defines us. I blame the technology industry, which — in the immortal words of “Cool Hand Luke” — has had a “failure to communicate.” For decades we’ve talked the language of IT and left it up to our direct customers to explain the proper care-and-feeding of data to their business users. Small wonder it’s way too hard for regular people to understand what we, as an industry, are doing. After all, how we can expect others to explain the do’s and don’ts of data management when we haven’t clearly explained it ourselves?
I say we need to start talking about the ABC’s of handling data in a way that’s easy for anyone to understand. I’m convinced we can because — if you think about it — everything you learned about data you learned in kindergarten: It has to be clean, safe and connected. Here’s what I mean:
Data cleanliness has always been important, but assumes real urgency with the move toward Big Data. I blame Hadoop, the underlying technology that makes Big Data possible. On the plus side, Hadoop gives companies a cost-effective way to store, process and analyze petabytes of nearly every imaginable data type. And that’s the problem as companies go through the enormous time suck of cataloging and organizing vast stores of data. Put bluntly, big data can be a swamp.
The question is, how to make it potable. This isn’t always easy, but it’s always, always necessary. It begins, naturally, by ensuring the data is accurate, de-deduped and complete.
Now comes the truly difficult part: Knowing where that data originated, where it’s been, how it’s related to other data and its lineage. That data provenance is absolutely vital in our hyper-connected world where one company’s data interacts with data from suppliers, partners, and customers. Someone else’s dirty data, regardless of origin, can ruin reputations and drive down sales faster than you can say “Target breach.” In fact, we now know that hackers entered Target’s point-of-sales terminals through a supplier’s project management and electronic billing system. We won’t know for a while the full extent of the damage. We do know the hack affected one-third of the entire U.S. population. Which brings us to:
Obviously, being safe means keeping data out of the hands of criminals. But it doesn’t stop there. That’s because today’s technologies make it oh so easy to misuse the data we have at our disposal. If we’re really determined to keep data safe, we have to think long and hard about responsibility and governance. We have to constantly question the data we use, and how we use it. Questions like:
- How much of our data should be accessible, and by whom?
- Do we really need to include personal information, like social security numbers or medical data, in our Hadoop clusters?
- When do we go the extra step of making that data anonymous?
And as I think about it, I realize that everything we learned in kindergarten boils down to down to the ethics of data: How, for example, do we know if we’re using data for good or for evil?
That question is especially relevant for marketers, who have a tendency to use data to scare people, for crass commercialism, or to violate our privacy just because technology makes it possible. Use data ethically, and we can help change the use.
In fact, I believe that the ethics of data is such an important topic that I’ve decided to make it the title of my new blog.
Stay tuned for more musings on The Ethics of Data.
Leo Eweani makes the case that the data tsunami is coming. “Businesses are scrambling to respond and spending accordingly. Demand for data analysts is up by 92%; 25% of IT budgets are spent on the data integration projects required to access the value locked up in this data “ore” – it certainly seems that enterprise is doing The Right Thing – but is it?”
Data is exploding within most enterprises. However, most enterprises have no clue how to manage this data effectively. While you would think that an investment in data integration would be an area of focus, many enterprises don’t have a great track record in making data integration work. “Scratch the surface, and it emerges that 83% of IT staff expect there to be no ROI at all on data integration projects and that they are notorious for being late, over-budget and incredibly risky.”
The core message from me is that enterprises need to ‘up their game’ when it comes to data integration. This recommendation is based upon the amount of data growth we’ve already experienced, and will experience in the near future. Indeed, a “data tsunami” is on the horizon, and most enterprises are ill prepared for it.
So, how do you get prepared? While many would say it’s all about buying anything and everything, when it comes to big data technology, the best approach is to splurge on planning. This means defining exactly what data assets are in place now, and will be in place in the future, and how they should or will be leveraged.
To face the forthcoming wave of data, certain planning aspects and questions about data integration rise to the top:
Performance, including data latency. Or, how quickly does the data need to flow from point or points A to point or points B? As the volume of data quickly rises, the data integration engines have got to keep up.
Data security and governance. Or, how will the data be protected both at-rest and in-flight, and how will the data be managed in terms of controls on use and change?
Abstraction, and removing data complexity. Or, how will the enterprise remap and re-purpose key enterprise data that may not currently exist in a well-defined and functional structure?
Integration with cloud-based data. Or, how will the enterprise link existing enterprise data assets with those that exist on remote cloud platforms?
While this may seem like a complex and risky process, think through the problems, leverage the right technology, and you can remove the risk and complexity. The enterprises that seem to fail at data integration do not follow that advice.
I suspect the explosion of data to be the biggest challenge enterprise IT will face in many years. While a few will take advantage of their data, most will struggle, at least initially. Which route will you take?
The transition to value-based care is well underway. From healthcare delivery organizations to clinicians, payers, and patients, everyone feels the impact. Each has a role to play. Moving to a value-driven model demands agility from people, processes, and technology. Organizations that succeed in this transformation will be those in which:
- Collaboration is commonplace
- Clinicians and business leaders wear new hats
- Data is recognized as an enterprise asset
The ability to leverage data will differentiate the leaders from the followers. Successful healthcare organizations will:
1) Establish analytics as a core competency
2) Rely on data to deliver best practice care
3) Engage patients and collaborate across the ecosystem to foster strong, actionable relationships
Trustworthy data is required to power the analytics that reveal the right answers, to define best practice guidelines and to identify and understand relationships across the ecosystem. In order to advance, data integration must also be agile. The right answers do not live in a single application. Instead, the right answers are revealed by integrating data from across the entire ecosystem. For example, in order to deliver personalized medicine, you must analyze an integrated view of data from numerous sources. These sources could include multiple EMRs, genomic data, data marts, reference data and billing data.
A recent PWC survey showed that 62% of executives believe data integration will become a competitive advantage. However, a July 2013 Information Week survey reported that 40% of healthcare executives gave their organization only a grade D or F on preparedness to manage the data deluge.
What grade would you give your organization?
You can improve your organization’s grade, but it will require collaboration between business and IT. If you are in IT, you’ll need to collaborate with business users who understand the data. You must empower them with self-service tools for improving data quality and connecting data. If you are a business leader, you need to understand and take an active role with the data.
To take the next step, download our new eBook, “Potential Unlocked: Transforming healthcare by putting information to work.” In it, you’ll learn:
- How to put your information to work
- New ways to govern your data
- What other healthcare organizations are doing
- How to overcome common barriers
So go ahead, download it now and let me know what you think. I look forward to hearing your questions and comments….oh, and your grade!
If you build an IT Architecture, it will be a constant up-hill battle to get business users and executives engaged and take ownership of data governance and data quality. In short you will struggle to maximize the information potential in your enterprise. But if you develop and Enterprise Architecture that starts with a business and operational view, the dynamics change dramatically. To make this point, let’s take a look at a case study from Cisco. (more…)
“If you don’t like change, you’re going to like irrelevancy a lot less.” I saw this powerful Ralph Waldo Emerson quotation in an MDM Summit presentation by Dagmar Garcia, senior manager of marketing data management at Citrix. In this interview, Dagmar explains how Citrix is achieving a measurable impact on marketing results by improving the quality of customer information and prospect information.
Q: What is Citrix’s mission?
A: Citrix is a $2.6 billion company. We help people work and collaborate from anywhere by easily accessing enterprise applications and data from any device. More than 250,000 organizations around the globe use our solutions and we have over 10,000 partners in 100 countries who resell Citrix solutions.
Q: What are marketing’s goals?
A: We operate in a hyper-competitive market. It’s critical to retain and expand relationships with existing enterprise and SMB customers and attract new ones. The marketing team’s goals are to boost campaign effectiveness and lead-to-opportunity conversion rates, while improving operational efficiencies.
But, it’s difficult to create meaningful customer segments and target them with relevant cross-sell and up-sell offers if marketing lacks access to clean, consistent and connected customer information and visibility into the total customer relationship across product lines.
Q: What is your role in achieving these goals?
A: I’ve been responsible for global marketing data management at Citrix for six years. My role is to identify, implement and maintain technical and business data management processes.I work with marketing leadership, GEO-based team members, sales operations, and operational experts to understand requirements, develop solutions and communicate results. I strive to create innovative solutions to improve the quality of master data at Citrix, including the roll-out and successful adoption of data governance and stewardship practices within Marketing and across other departments.
Q: What drove the decision to tackle inaccurate, inconsistent and disconnected customer and prospect information?
A: In 2011, the quality of customer information and prospect information was identified as the #1 problem by our sales and marketing teams. Account and contact information was incomplete, inaccurate and duplicated in our CRM system.
Another challenge was fragmented and inconsistent master account information scattered across the organization’s multiple applications. It was difficult to know which source had the most accurate and up-to-date customer and prospect information.
To be successful, we needed a single source of the truth, one system of reference where data management best practices were centralized and consistent. This was a requirement to understand the total customer relationship across product lines. We asked ourselves:
- How can we improve campaign effectiveness if more than 40% of the contacts in our customer relationship management system (CRM) are inactive?
- How can we create meaningful customer segments for targeted cross-sell and up-sell offers when we don’t have visibility into all the products they already have?
- How can we improve lead to opportunity conversion rates if we have incomplete prospect data?
- How can we improve operational efficiencies if we have double the duplicate customer and prospect information than the industry standard?
- How can we maintain high data quality standards in our global operations if we lack the data quality technology and processes needed to be successful?
Q: How are you managing customer and prospect information now?
A: We built a marketing data management foundation. We centralized our data management and reduced manual, error-prone and time-consuming data quality efforts. To decrease the duplicate account and contact rate, we focused on managing the quality of our data as close to the source as possible by improving data validation at points of entry.
Q: What role does Informatica play?
A: We using master data management (MDM) to:
- pull together fragmented customer, prospect and partner information scattered across applications into one central, trusted location where it can be mastered, managed and shared on an ongoing basis,
- organize customer, prospect and partner information so we know how companies and people are related to each other, which hierarchies and networks they belong to, including their roles and organizations, and
- syndicate clean, consistent and connected customer, partner and product information to applications, such as CRM and data warehouses for analytics.
Q: Why did you choose Informatica?
A: After completing a thorough analysis of our gaps, we knew the best solution was a combination of MDM technology and a data governance process. We wanted to empower the business to manage customer information, navigate multiple hierarchies, handle exceptions and make changes with a transparent process through an easy-to-use interface.
At the same time, we did extensive industry research and learned Informatica MDM was ranked as a visionary and thought leader in the master data management solution space and could support our data governance process.
Q: Can you share some of the results you’ve achieved?
A: Now that marketing uses clean, consistent and connected customer and prospect information and an understanding of the total customer relationship, we’ve seen a positive impact on these key metrics:
↑ 20% lead-to-opportunity conversion rates
↑ 20% operational efficiency
↑ 50% quality data at point of entry
↓ 50% in prospect accounts duplication rate
↓ 50% in creation of duplicate prospect accounts and contacts
↓ 50% in junk data rate
In my first article on the topic of citizens’ digital health and safety we looked at the states’ desire to keep their citizens healthy and safe and also at the various laws and regulations they have in place around data breaches and losses. The size and scale of the problem together with some ideas for effective risk mitigation are in this whitepaper.
Let’s now start delving a little deeper into the situation states are faced with. It’s pretty obvious that citizen data that enables an individual to be identified (PII) needs to be protected. We immediately think of the production data: data that is used in integrated eligibility systems; in health insurance exchanges; in data warehouses and so on. In some ways the production data is the least of our problems; our research shows that the average state has around 10 to 12 full copies of data for non-production (development, test, user acceptance and so on) purposes. This data tends to be much more vulnerable because it is widespread and used by a wide variety of people – often subcontractors or outsourcers, and often the content of the data is not well understood.
Obviously production systems need access to real production data (I’ll cover how best to protect that in the next issue), on the other hand non-production systems of every sort do not. Non-production systems most often need realistic, but not real data and realistic, but not real data volumes (except maybe for the performance/stress/throughput testing system). What need to be done? Well to start with, a three point risk remediation plan would be a good place to start.
1. Understand the non-production data using sophisticated data and schema profiling combined with NLP (Natural Language Processing) techniques help to identify previously unrealized PII that needs protecting.
2. Permanently mask the PII so that it is no longer the real data but is realistic enough for non-production uses and make sure that the same masking is applied to the attribute values wherever they appear in multiple tables/files.
3. Subset the data to reduce data volumes, this limits the size of the risk and also has positive effects on performance, run-times, backups etc.
Gartner has just published their 2013 magic quadrant for data masking this covers both what they call static (i.e. permanent or persistent masking) and dynamic (more on this in the next issue) masking. As usual the MQ gives a good overview of the issues behind the technology as well as a review of the position, strengths and weaknesses of the leading vendors.
It is (or at least should be) an imperative that from the top down state governments realize the importance and vulnerability of their citizens data and put in place a non-partisan plan to prevent any future breaches. As the reader might imagine, for any such plan to success needs a combination of cultural and organizational change (getting people to care) and putting the right technology – together these will greatly reduce the risk. In the next and final issue on this topic we will look at the vulnerabilities of production data, and what can be done to dramatically increase its privacy and security.
Murphy’s First Law of Bad Data – If You Make A Small Change Without Involving Your Client – You Will Waste Heaps Of Money
I have not used my personal encounter with bad data management for over a year but a couple of weeks ago I was compelled to revive it. Why you ask? Well, a complete stranger started to receive one of my friend’s text messages – including mine – and it took days for him to detect it and a week later nobody at this North American wireless operator had been able to fix it. This coincided with a meeting I had with a European telco’s enterprise architecture team. There was no better way to illustrate to them how a customer reacts and the risk to their operations, when communication breaks down due to just one tiny thing changing – say, his address (or in the SMS case, some random SIM mapping – another type of address).
In my case, I moved about 250 miles within the United States a couple of years ago and this seemingly common experience triggered a plethora of communication screw ups across every merchant a residential household engages with frequently, e.g. your bank, your insurer, your wireless carrier, your average retail clothing store, etc.
For more than two full years after my move to a new state, the following things continued to pop up on a monthly basis due to my incorrect customer data:
- In case of my old satellite TV provider they got to me (correct person) but with a misspelled last name at my correct, new address.
- My bank put me in a bit of a pickle as they sent “important tax documentation”, which I did not want to open as my new tenants’ names (in the house I just vacated) was on the letter but with my new home’s address.
- My mortgage lender sends me a refinancing offer to my new address (right person & right address) but with my wife’s as well as my name completely butchered.
- My wife’s airline, where she enjoys the highest level of frequent flyer status, continually mails her offers duplicating her last name as her first name.
- A high-end furniture retailer sends two 100-page glossy catalogs probably costing $80 each to our address – one for me, one for her.
- A national health insurer sends “sensitive health information” (disclosed on envelope) to my new residence’s address but for the prior owner.
- My legacy operator turns on the wrong premium channels on half my set-top boxes.
- The same operator sends me a SMS the next day thanking me for switching to electronic billing as part of my move, which I did not sign up for, followed by payment notices (as I did not get my invoice in the mail). When I called this error out for the next three months by calling their contact center and indicating how much revenue I generate for them across all services, they counter with “sorry, we don’t have access to the wireless account data”, “you will see it change on the next bill cycle” and “you show as paper billing in our system today”.
Ignoring the potential for data privacy law suits, you start wondering how long you have to be a customer and how much money you need to spend with a merchant (and they need to waste) for them to take changes to your data more seriously. And this are not even merchants to whom I am brand new – these guys have known me and taken my money for years!
One thing I nearly forgot…these mailings all happened at least once a month on average, sometimes twice over 2 years. If I do some pigeon math here, I would have estimated the postage and production cost alone to run in the hundreds of dollars.
However, the most egregious trespass though belonged to my home owner’s insurance carrier (HOI), who was also my mortgage broker. They had a double whammy in store for me. First, I received a cancellation notice from the HOI for my old residence indicating they had cancelled my policy as the last payment was not received and that any claims will be denied as a consequence. Then, my new residence’s HOI advised they added my old home’s HOI to my account.
After wondering what I could have possibly done to trigger this, I called all four parties (not three as the mortgage firm did not share data with the insurance broker side – surprise, surprise) to find out what had happened.
It turns out that I had to explain and prove to all of them how one party’s data change during my move erroneously exposed me to liability. It felt like the old days, when seedy telco sales people needed only your name and phone number and associate it with some sort of promotion (back of a raffle card to win a new car), you never took part in, to switch your long distance carrier and present you with a $400 bill the coming month. Yes, that also happened to me…many years ago. Here again, the consumer had to do all the legwork when someone (not an automatic process!) switched some entry without any oversight or review triggering hours of wasted effort on their and my side.
We can argue all day long if these screw ups are due to bad processes or bad data, but in all reality, even processes are triggered from some sort of underlying event, which is something as mundane as a database field’s flag being updated when your last purchase puts you in a new marketing segment.
Now imagine you get married and you wife changes her name. With all these company internal (CRM, Billing, ERP), free public (property tax), commercial (credit bureaus, mailing lists) and social media data sources out there, you would think such everyday changes could get picked up quicker and automatically. If not automatically, then should there not be some sort of trigger to kick off a “governance” process; something along the lines of “email/call the customer if attribute X has changed” or “please log into your account and update your information – we heard you moved”. If American Express was able to detect ten years ago that someone purchased $500 worth of product with your credit card at a gas station or some lingerie website, known for fraudulent activity, why not your bank or insurer, who know even more about you? And yes, that happened to me as well.
Tell me about one of your “data-driven” horror scenarios?
I’m excited to share that, since its launch in January 2013, the GovernYourData.com community has been very well received. With over 4,700 unique visitors and nearing 600 registered members, many data management practitioners recognize it as a valuable go-to resource to support their data governance efforts. While maintaining our core objective of vendor- and product-neutrality, the site offers over 100 best practice blog posts from over 17 different contributors, shares the details on a dozen upcoming industry events, and has links to a wide variety of white papers, analyst research, recommended books, and other educational resources. (more…)
There are three reasons why we haven’t achieved 1-click data management in a corporate data marketplace. First, it wasn’t a problem until recently. The signs that we really needed to manage data as an asset across the enterprise only appeared about 20 years ago. Prior to that, data management occurred at the application system level and we didn’t need a separate focus on Information Asset Management (IAM) at the enterprise level. The past five years however have a seen a strong growing awareness of the challenges and need for IAM; to a large degree driven by big-data opportunities and data privacy and confidentiality concerns. (more…)