Category Archives: Data masking
In response to the growth, organizations seek new ways to unlock the value of their data. Traditionally, data has been analyzed for a few key reasons. First, data was analyzed in order to identify ways to improve operational efficiency. Secondly, data was analyzed to identify opportunities to increase revenue.
As data expands, companies have found new uses for these growing data sets. Of late, organizations have started providing data to partners, who then sell the ‘intelligence’ they glean from within the data. Consider a coffee shop owner whose store doesn’t open until 8 AM. This owner would be interested in learning how many target customers (Perhaps people aged 25 to 45) walk past the closed shop between 6 AM and 8 AM. If this number is high enough, it may make sense to open the store earlier.
As much as organizations prioritize the value of data, customers prioritize the privacy of data. If an organization loses a customer’s data, it results in a several costs to the organization. These costs include:
- Damage to the company’s reputation
- A reduction of customer trust
- Financial costs associated with the investigation of the loss
- Possible governmental fines
- Possible restitution costs
To guard against these risks, data that organizations provide to their partners must be obfuscated. This protects customer privacy. However, data that has been obfuscated is often of a lower value to the partner. For example, if the date of birth of those passing the coffee shop has been obfuscated, the store owner may not be able to determine if those passing by are potential customers. When data is obfuscated without consideration of the analysis that needs to be done, analysis results may not be correct.
There is away to provide data privacy for the customer while simultaneously monetizing enterprise data. To do so, organizations must allow trusted partners to define masking generalizations. With sufficient data masking governance, it is indeed possible for data obfuscation and data value to coexist.
Currently, there is a great deal of research around ensuring that obfuscated data is both protected and useful. Techniques and algorithms like ‘k-Anonymity’ and ‘l-Diversity’ ensure that sensitive data is safe and secure. However, these techniques have have not yet become mainstream. Once they do, the value of big data will be unlocked.
We can all imagine self-driving cars that distinguish between a life-threatening situation (like a swerving car ahead) or a thing-threatening occurrence (like a scurrying raccoon) and brake and steer accordingly. And we expect automated picking systems will soon know — by a SKU’s size, shape, weight and temperature — which assembly line or packing area gets which products. And it won’t be long before enterprise systems will see and plug security holes across hundreds of systems, no matter whether the data is hosted internally or held by partners and suppliers.
The underpinning for such smarts is data that’s clean, safe and connected — the hallmarks of everything we do and believe in at Informatica. But we also recognize that next-generation products need something more. They also need to know when and where data changes, along with how to get the right data to the right person, place or thing, in the right way. That’s why Informatica is unveiling our vision for an Intelligent Data Platform, fueled by new technology innovations in data intelligence.
Data intelligence is built on two new capabilities – live data map and inference engine. Live data map continuously updates all the metadata—structural, semantic, usage and otherwise— on all of the data flowing through an enterprise, while the inference engine can deduce user intentions, help humans search for what they need in their own natural language, and provide recommendations on the best way to consume data depending on the use case. The combination ensures that clean, safe and connected data gets to whomever or whatever needs it, as it’s needed—fast.
We at Informatica believe these capabilities are so incredibly vital for the enterprise that the Intelligent Data Platform now serves as the foundation of many of our future products — beginning with Project Springbok and Project Secure@Source™. These two new offerings simplify some of the toughest challenges facing people in the enterprise: letting business users find and use the data they need, and seeing where their most-sensitive data is hiding amidst all the nooks and crannies.
Project Springbok’s Excel-like interface lets everyday business folks and mere mortals find the data sets they’re interested in, fix formatting and quality issues, and do tasks that are a pain today to perform — such as combining data sets or publishing the results for colleagues to reuse and enhance. Project Springbok is also a guide, with its recommendations derived by the inference engine. It tells users the sources they could or should have access to, and then provisions only what they should have. It lets users see which data sets colleagues are most frequently accessing and finding the most valuable. It also alerts users to inconsistent or incomplete data, suggests ways to sort new combinations of data sets and recommends the best data for the task.
While we designed Project Springbok for the average business user, Project Secure@Source is intended for people responsible for protecting the enterprise, including chief risk officers, chief information security officers (CISOs) and even board members of public companies. That’s because Project Secure@Source’s graphical interface displays all the systems holding sensitive data, such as social security numbers, medical records or payment card information.
But it’s not enough just to know where that data is. To safeguard all the sensitive information about their products, their customers, and their employees, users also need to understand how that data got into these systems, how it moves around, and who is using it. Project Secure@Source does that, too — showing, for example, that an engineer used payment card data to test a Hadoop cluster, and left it there. With Project Secure@Source, users can selectively remove or mask that data from any system in the enterprise.
You’ll hear us talk about and showcase the Intelligent Data Platform, Project Springbok and Project Secure@Source at Informatica World on May 13 and 14. I hope you’ll join us to learn how our vision and our product roadmap will enable a smarter world for all of us, today.
I recently met with a longtime colleague from the Oracle E-Business Suite implementation eco-system, now VP of IT for a global technology provider. This individual has successfully implemented data archiving and data masking technologies to eliminate duplicate applications and control the costs of data growth – saving tens of millions of dollars. He has freed up resources that were re-deployed within new innovative projects such as Big Data – giving him the reputation as a thought leader. In addition, he has avoided exposing sensitive data in application development activities by securing it with data masking technology – thus securing his reputation.
When I asked him about those projects and the impact on his career, he responded, ‘Data archiving and data security are table stakes in the Oracle Applications IT game. However, if I want to be a part of anything important, it has to involve Cloud and Big Data.’ He further explained how the savings achieved from Informatica Data Archive enabled him to increase employee retention rates because he was able to fund an exciting Hadoop project that key resources wanted to work on. Not to mention, as he transitioned from physical infrastructure to a virtual server by retiring legacy applications – he had accomplished his first step on his ‘journey to the cloud’. This would not have been possible if his data required technology that was not supported in the cloud. If he hadn’t secured sensitive data and had experienced a breach, he would be looking for a new job in a new industry.
Not long after, I attended a CIO summit where the theme of the conference was ‘Breakthrough Innovation’. Of course, Cloud and Big Data were main stage topics – not just about the technology, but about how it was used to solve business challenges and provide services to the new generation of ‘entitled’ consumers. This is the description of those who expect to have everything at their fingertips. They want to be empowered to share or not share their information. They expect that if you are going to save their personal information, it will not be abused. Lastly, they may even expect to try a product or service for free before committing to buy.
In order to size up to these expectations, Application Owners, like my long-time colleague, need to incorporate Data Archive and Data Masking in their standard SDLC processes. Without Data Archive, IT budgets may be consumed by supporting old applications and mountains of data, thereby becoming inaccessible for new innovative projects. Without Data Masking, a public breach will drive many consumers elsewhere.
For the past few years, the press has been buzzing about the potential value of Big Data. However, there is little coverage focusing on the data itself – how do you get it, is it accurate, and who can be trusted with it?
We are the source of data that is often spoken about – our children, friends and relatives and especially those people we know on Facebook or LinkedIn. Over 40% of Big Data projects are in the sales and marketing arena – relying on personal data as a driving force. While machines have no choice but to provide data when requested, people do have a choice. We can choose not to provide data, or to purposely obscure our data, or to make it up entirely.
So, how can you ensure that your organization is receiving real information? Active participation is needed to ensure a constant flow of accurate data to feed your data-hungry algorithms and processes. While click-stream analysis does not require individual identification, follow-up sales & marketing campaigns will have limited value if the public at large is using false names and pretend information.
BCG has identified a link between trust and data sharing:
“We estimate that those that manage this issue well [creating trust] should be able to increase the amount of consumer data they can access by at least five to ten times in most countries.”[i]
With that in mind, how do you create the trust that will entice people to share data? The principles behind common data privacy laws provide guidelines. These include: accountability, purpose identification and disclosure, collection with knowledge and consent, data accuracy, individual access and correction, as well as the right to be forgotten.
But there are challenges in personal data stewardship – in part because the current world of Big Data analysis is far from stable. In the ongoing search for the value of Big Data, new technologies, tools and approaches are being piloted. Experimentation is still required which means moving data around between data storage technologies and analytical tools, and giving unprecedented access to data in terms of quantity, detail and variety to ever growing teams of analysts. This experimentation should not be discouraged, but it must not degrade the accuracy or security of your customers’ personal data.
How do you measure up? If I made contact and asked for the sum total of what you knew about me, and how my data was being used – how long would it take to provide this information? Would I be able to correct my information? How many of your analysts can view my personal data and how many copies have you distributed in your IT landscape? Are these copies even accurate?
Through our data quality, data mastering and data masking tools, Informatica can deliver a coordinated approach to managing your customer’s personal data and build trust by ensuring the safety and accuracy of that data. With Informatica managing your customer’s data, your internal team can focus their attention on analytics. Analytics from accurate data can help develop the customer loyalty and engagement that is vital to both the future security of your business and continued collection of accurate data to feed your Big Data analysis.
[i] The Trust Advantage: How to Win with Big Data; bcg.perspectives November 2013
- The RSA conference took place in San Francisco from February 24-28, 2014
- The IAPP Global Privacy Summit took place Washington, DC from March 5-7, 2014
Data Privacy at the 2014 RSA Conference
The RSA conference was busy as expected, with over 30,000 attendees. Informatica co-sponsored an after-hours event with one of our partners, Imperva, at the Dark Circus. The event was standing room only and provided a great escape from the torrential rain. One highlight of RSA, for Informatica, is that we were honored with two of the 2014 Security Products Guide Awards:
- Informatica Dynamic Data Masking won the Gold Award for Database Security, Data Leakage Prevention/Extrusion Prevention
- Informatica Cloud Test Data Management and Security won the Bronze Award for New Products
Of particular interest to us was the growing recognition of data-centric security and privacy at RSA. I briefly met Bob Rudis, co-author of “Data Driven Security” which was featured at the onsite bookstore. In the book, Rudis has presented a great case for focusing on data as the center-point of security, through data analysis and visualization. From Informatica’s perspective, we also believe that a deep understanding of data and its relationships will escalate as a key driver of security policies and measures.
Data Privacy at the IAPP Global Privacy Summit
The IAPP Global Privacy Summit was an amazing event, small (2,500), but completely sold-out and overflowing its current venue. We exhibited and had the opportunity to meet CPOs, privacy, risk/compliance and security professionals from around the world, and had hundreds of conversations about the role of data discovery and masking for privacy. From the privacy perspective, it is all about finding, de-identification and protection of PII, PCI and PHI. These privacy professionals have extensive legal and/or data security backgrounds and understand the need to safeguard privacy by using data masking. Many notable themes were present at IAPP:
- De-identification is a key topic area
- Concerns about outsourcing and contractors in application development and testing have driven test data management adoption
- No national US privacy regulations expected in the short-term
- Europe has active but uneven privacy enforcement (France: “name and shame”, UK: heavy fines, Spain; most active)
If you want to learn more about data privacy and security, you will find no better place than Informatica World 2014. There, you’ll learn about the latest data security trends, see updates to Informatica’s data privacy and security offerings, and find out how Informatica protects sensitive information in real time without requiring costly, time-consuming changes to applications and databases. Register TODAY!
Are you interested in Oracle Data Migration Best Practices? Are you upgrading, consolidating or migrating to or from an Oracle application? Moving to the cloud or a hosted service? Research and experience confirms that the tasks associated with migrating application data during these initiatives have the biggest impact on whether the project is considered a failure or success. So how do your peers ensure data migration success?
Informatica will be offering a full day Oracle Migrations Best Practices workshop at Oracle Application User Group’s annual conference, Collaborate 14, this year on April 7th in Las Vegas, NV. During this workshop, peers and experts will share best practices for how to avoid the pitfalls and ensure successful projects, lowering migration cost and risk. Our full packed agenda includes:
- Free use and trials of data migration tools and software
- Full training sessions on how to integrate cloud-based applications
- How to provision test data using different data masking techniques
- How to ensure consistent application performance during and after a migration
- A review of Oracle Migration Best Practices and case studies
Case Study: EMC
One of the key case studies that will be highlighted is EMC’s Oracle migration journey. EMC Corporation migrated to Oracle E-Business Suite, acquired more than 40 companies in 4 years, consolidated and retired environments, and is now on its path to migrating to SAP. Not only did they migrate applications, but they also migrated their entire technology platform from physical to virtual on their journey to the cloud. They needed to control the impact of data growth along the way, manage the size of their test environments while reducing the risk of exposing sensitive data to unauthorized users during development cycles. With best practices, and the help from Informatica, they estimate that they have saved approximately $45M in IT cost savings throughout their migrations. Now that they are deploying a new analytics platform based on Hadoop. They are leveraging existing skill sets and Informatica tools to ensure data is loaded into Hadoop without missing a beat.
Case Study: Verizon
Verizon is the second case study we will be discussing. They recently migrated to Salesforce.com and needed to ensure that more than 100 data objects were integrated with on-premises, back end applications. In addition, they needed to ensure that data was synchronized and kept secure in non-production environments in the cloud. They were able to leverage a cloud-based integration solution from Informatica to simplify their complex IT application architecture and maintain data availability and security – all while migrating a major business application to the cloud.
Case Study: OEM Heavy Equipment Manufacturer
The third case study we will review involves a well-known heavy equipment manufacturer who was facing a couple of challenges – the first was a need to separate data in in an Oracle E-Business Suite application as a result of a divestiture. Secondly, they also needed to control the impact of data growth on their production application environments that were going through various upgrades. Using an innovative approach based on Smart Partitioning, this enterprise estimates it will save $23M over a 5 year period while achieving 40% performance improvements across the board.
To learn more about what Informatica will be sharing at Collaborate 14, watch this video. If you are planning to attend Collaborate 14 this year and you are interested in joining us, you can register for the Oracle Migrations Best Practices Workshop here.
In the first two issues I spent time looking at the need for states to pay attention to the digital health and safety of their citizens, followed by the oft forgotten need to understand and protect the non-production data. This is data than has often proliferated and also ignored or forgotten about.
In many ways, non-production data is simpler to protect. Development and test systems can usually work effectively with realistic but not real PII data and realistic but not real volumes of data. On the other hand, production systems need the real production data complete with the wealth of information that enables individuals to be identified – and therefore presents a huge risk. If and when that data is compromised either deliberately or accidentally the consequences can be enormous; in the impact on the individual citizens and also the cost of remediation on the state. Many will remember the massive South Carolina data breach of late 2012 when over the course of 2 days a 74 GB database was downloaded and stolen, around 3.8 million payers and 1.9 million dependents had their social security information stolen and 3.3 million “lost” bank account details. The citizens’ pain didn’t end there, as the company South Carolina picked to help its citizens seems to have tried to exploit the situation.
The biggest problem with securing production data is that there are numerous legitimate users and uses of that data, and most often just a small number of potentially malicious or accidental attempts of inappropriate or dangerous access. So the question is… how does a state agency protect its citizens’ sensitive data while at the same time ensuring that legitimate uses and users continues – without performance impacts or any disruption of access? Obviously each state needs to make its own determination as to what approach works best for them.
This video does a good job at explaining the scope of the overall data privacy/security problems and also reviews a number of successful approaches to protecting sensitive data in both production and non-production environments. What you’ll find is that database encryption is just the start and is fine if the database is “stolen” (unless of course the key is stolen along with the data! Encryption locks the data away in the same way that a safe protects physical assets – but the same problem exists. If the key is stolen with the safe then all bets are off. Legitimate users are usually easily able deliberately breach and steal the sensitive contents, and it’s these latter occasions we need to understand and protect against. Given that the majority of data breaches are “inside jobs” we need to ensure that authorized users (end-users, DBAs, system administrators and so on) that have legitimate access only have access to the data they absolutely need, no more and no less.
So we have reached the end of the first series. In the first blog we looked at the need for states to place the same emphasis on the digital health and welfare of their citizens as they do on their physical and mental health. In the second we looked at the oft-forgotten area of non-production (development, testing, QA etc.) data. In this third and final piece we looked at the need to and some options for providing the complete protection of non-production data.
In my first article on the topic of citizens’ digital health and safety we looked at the states’ desire to keep their citizens healthy and safe and also at the various laws and regulations they have in place around data breaches and losses. The size and scale of the problem together with some ideas for effective risk mitigation are in this whitepaper.
Let’s now start delving a little deeper into the situation states are faced with. It’s pretty obvious that citizen data that enables an individual to be identified (PII) needs to be protected. We immediately think of the production data: data that is used in integrated eligibility systems; in health insurance exchanges; in data warehouses and so on. In some ways the production data is the least of our problems; our research shows that the average state has around 10 to 12 full copies of data for non-production (development, test, user acceptance and so on) purposes. This data tends to be much more vulnerable because it is widespread and used by a wide variety of people – often subcontractors or outsourcers, and often the content of the data is not well understood.
Obviously production systems need access to real production data (I’ll cover how best to protect that in the next issue), on the other hand non-production systems of every sort do not. Non-production systems most often need realistic, but not real data and realistic, but not real data volumes (except maybe for the performance/stress/throughput testing system). What need to be done? Well to start with, a three point risk remediation plan would be a good place to start.
1. Understand the non-production data using sophisticated data and schema profiling combined with NLP (Natural Language Processing) techniques help to identify previously unrealized PII that needs protecting.
2. Permanently mask the PII so that it is no longer the real data but is realistic enough for non-production uses and make sure that the same masking is applied to the attribute values wherever they appear in multiple tables/files.
3. Subset the data to reduce data volumes, this limits the size of the risk and also has positive effects on performance, run-times, backups etc.
Gartner has just published their 2013 magic quadrant for data masking this covers both what they call static (i.e. permanent or persistent masking) and dynamic (more on this in the next issue) masking. As usual the MQ gives a good overview of the issues behind the technology as well as a review of the position, strengths and weaknesses of the leading vendors.
It is (or at least should be) an imperative that from the top down state governments realize the importance and vulnerability of their citizens data and put in place a non-partisan plan to prevent any future breaches. As the reader might imagine, for any such plan to success needs a combination of cultural and organizational change (getting people to care) and putting the right technology – together these will greatly reduce the risk. In the next and final issue on this topic we will look at the vulnerabilities of production data, and what can be done to dramatically increase its privacy and security.
Informatica announced, once again, that it is listed as a leader in the industry’s second Gartner Magic Quadrant for Data Masking Technology. With data security continuing to grow as one of the fastest segments in the enterprise software market, technologies such as data masking are becoming the solution of choice for data-centric security.
Increased fear of cyber-attacks and internal data breaches has made predictions that 2014 is the year of preventative and tactical measures to ensure corporate data assets are safe. Data masking should be included in those measures. According to Gartner,
“Security program managers need to take a strategic approach with tactical best-practice technology configurations in order to properly address the most common advanced targeted attack scenarios to increase both detection and prevention capabilities.”
Without these measures, the cost of an attack or breach is growing every year. The Ponemon Institute posted in a recent study:
“The 2013 Cost of Cyber Crime Study states that the average annualized cost of cybercrime incurred by a benchmark sample of US organizations was $11.56 million, nearly 78% more than the cost estimated in the first analysis conducted 4 years ago.”
Informatica believes that the best preventative measures include a layered approach for data security but without sacrificing agility or adding unnecessary costs. Data Masking delivers data-centric security with improved productivity and reduced overall costs.
Data Masking prevents internal data theft and abuse of sensitive data by hiding it from users. Data masking techniques include replacing some fields with similar-looking characters, masking characters (for example, “x”), substituting real last names with fictional last names and shuffling data within columns – to name a few. Other terms for data masking include data obfuscation, sanitization, scrambling, de-identification, and anonymization . Call it what you like, but without it – organizations may continue to expose sensitive data to those with mal intentions.
To learn more, Download the Gartner Magic Quadrant Data Masking Report now. And visit the Informatica website for data masking product information.
About the Magic Quadrant
Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings. Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose
This is the first in a series of articles where I will take an in-depth look at how state and local governments are affected by data breaches and what they should be considering as part of their compliance, risk-avoidance and remediation plans.
Each state has one or more agencies that are focused on the lives, physical and mental health and overall welfare of their citizens. The mission statement of the Department of Public Welfare of Pennsylvania, my home state is typical, it reads “Our vision is to see Pennsylvanians living safe, healthy and independent lives. Our mission is to improve the quality of life for Pennsylvania’s individuals and families. We promote opportunities for independence through services and supports while demonstrating accountability for taxpayer resources.”
Just as in the enterprise, over the last couple of decades the way an agency deals with citizens has changed dramatically. No longer is everything paper-based and manually intensive – each state has made enormous efforts not just to automate more and more of their processes but more lately to put everything online. The combination of these two factors has led to the situation where just about everything a state knows about each citizen is stored in numerous databases, data warehouses and of course accessed through the Web.
It’s interesting that in the PA mission statement two of the three focus areas are safety and health– I am sure when written these were meant in the physical sense. We now have to consider what each state is doing to safeguard and promote the digital safety and health of its citizens. You might ask what digital safety and health means – at the highest level this is quite straightforward – it means that each state must ensure the data it holds about its’ citizens is safe from inadvertent or deliberate exposure or disclosure. It seems that each week we read about another data breach – high profile data breach infographic - either accidental (a stolen laptop for instance) or deliberate (hacking as an example) losses of data about people – the citizens. Often that includes data contents that can be used to identify the individuals, and once an individual citizen is identified they are at risk of identity theft, credit card fraud or worse.
Of the 50 states, 46 now have a series of laws and regulations in place about when and how they need to report on data breaches or losses – this is all well and good, but is a bit like shutting the stable door after the horse has bolted – but with higher stakes as there are potentially dire consequences to the digital safety and health of their citizens.
In the next article I will look at the numerous areas that are often overlooked when states establish and execute their data protection and data privacy plans.