Tag Archives: data masking
I live in a very small town in Maine. I don’t spend a lot of time thinking about my privacy. Some would say that by living in a small town, you give up your right to privacy because everyone knows what everyone else is doing. Living here is a choice – for me to improve my family’s quality of life. Sharing all of the details of my life – not so much.
When I go to my doctor (who also happens to be a parent from my daughter’s school), I fully expect that any sort of information that I share with him, or that he obtains as a result of lab tests or interviews, or care that he provides is not available for anyone to view. On the flip side, I want researchers to be able to take my lab information combined with my health history in order to do research on the effectiveness of certain medications or treatment plans.
As a result of this dichotomy, Congress (in 1996) started to address governance regarding the transmission of this type of data. The Health Insurance Portability and Accountability Act of 1996 (HIPAA) is a Federal law that sets national standards for how health care plans, health care clearinghouses, and most health care providers protect the privacy of a patient’s health information. With certain exceptions, the Privacy Rule protects a subset of individually identifiable health information, known as protected health information or PHI, that is held or maintained by covered entities or their business associates acting for the covered entity. PHI is any information held by a covered entity which concerns health status, provision of health care, or payment for health care that can be linked to an individual.
Many payers have this type of data in their systems (perhaps in a Claims Administration system), and have the need to share data between organizational entities. Do you know if PHI data is being shared outside of the originating system? Do you know if PHI is available to resources that have no necessity to access this information? Do you know if PHI data is being shared outside your organization?
If you can answer yes to each of these questions – fantastic. You are well ahead of the curve. If not – you need to start considering solutions that can
- Identify PHI in all of your data streams
- Monitor and track the flow of this data throughout your organization and
- Mask this data if it is being shared with resources that don’t need to be able to identify the individual.
I want to researchers to have access to medically relevant data so they can find the cures to some horrific diseases. I want to feel comfortable sharing health information with my doctor. I want to feel comfortable that my health insurance company is respecting my privacy. Now to get my kids to stop oversharing.
In the report, Gartner cites. “Global-scale scandals around sensitive data losses have highlighted the need for effective data protection, especially from insider attacks. Data masking, which is focused on protecting data from insiders and outsiders, is a must-have technology in enterprises’ and governments’ security portfolios.”
Organizations realize that data protection must be hardened to protect against the inevitable breach; originating from either internal or external threats. Data masking covers gaps in data protection in production and non-production environments that can be exploited by attackers.
Informatica customers are elevating the importance of data security initiatives in 2015 given the high exposure of recent breaches and the shift from just stealing identities and intellectual property, to politically charged platforms. This raises the concern that existing security controls are insufficient and a more data-centric security approach is necessary.
Recent enforcement by the Federal Trade Commission in the US and emerging legislation worldwide has clearly indicated that sensitive data access and sharing should be tightly controlled; this is the strength of data masking.
Data Masking de-identifies and/or de-sensitizes private and confidential data by hiding it from those who are unauthorized to access it. Other terms for data masking include data obfuscation, sanitization, scrambling, de-identification, and anonymization.
To learn more, Download the Gartner Magic Quadrant Data Masking Report now. And visit the Informatica website for data masking product information.
About the Magic Quadrant
Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings. Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.
Which Method of Controls Should You Use to Protect Sensitive Data in Databases and Enterprise Applications? Part II
- Do you need to protect data at rest (in storage), during transmission, and/or when accessed?
- Do some privileged users still need the ability to view the original sensitive data or does sensitive data need to be obfuscated at all levels?
- What is the granularity of controls that you need?
- Datafile level
- Table level
- Row level
- Field / column level
- Cell level
- Do you need to be able to control viewing vs. modification of sensitive data?
- Do you need to maintain the original characteristics / format of the data (e.g. for testing, demo, development purposes)?
- Is response time latency / performance of high importance for the application? This can be the case for mission critical production applications that need to maintain response times in the order of seconds or sub-seconds.
In order to help you determine which method of control is appropriate for your requirements, the following table provides a comparison of the different methods and their characteristics.
A combination of protection method may be appropriate based on your requirements. For example, to protect data in non-production environments, you may want to use persistent data masking to ensure that no one has access to the original production data, since they don’t need to. This is especially true if your development and testing is outsourced to third parties. In addition, persistent data masking allows you to maintain the original characteristics of the data to ensure test data quality.
In production environments, you may want to use a combination of encryption and dynamic data masking. This is the case if you would like to ensure that all data at rest is protected against unauthorized users, yet you need to protect sensitive fields only for certain sets of authorized or privileged users, but the rest of your users should be able to view the data in the clear.
The best method or combination of methods will depend on each scenario and set of requirements for your environment and organization. As with any technology and solution, there is no one size fits all.
Which Method of Controls Should You Use to Protect Sensitive Data in Databases and Enterprise Applications? Part I
- Which types of data should be protected?
- Which data should be classified as “sensitive?”
- Where is this sensitive data located?
- Which groups of users should have access to this data?
Because these questions come up frequently, it seems ideal to share a few guidelines on this topic.
When protecting the confidentiality and integrity of data, the first level of defense is Authentication and access control. However, data with higher levels of sensitivity or confidentiality may require additional levels of protection, beyond regular authentication and authorization methods.
There are a number of control methods for securing sensitive data available in the market today, including:
- Persistent (Static) Data Masking
- Dynamic Data Masking
- Retention management and purging
Encryption is a cryptographic method of encoding data. There are generally, two methods of encryption: symmetric (using single secret key) and asymmetric (using public and private keys). Although there are methods of deciphering encrypted information without possessing the key, a good encryption algorithm makes it very difficult to decode the encrypted data without knowledge of the key. Key management is usually a key concern with this method of control. Encryption is ideal for mass protection of data (e.g. an entire data file, table, partition, etc.) against unauthorized users.
Persistent or static data masking obfuscates data at rest in storage. There is usually no way to retrieve the original data – the data is permanently masked. There are multiple techniques for masking data, including: shuffling, substitution, aging, encryption, domain-specific masking (e.g. email address, IP address, credit card, etc.), dictionary lookup, randomization, etc. Depending on the technique, there may be ways to perform reverse masking – this should be used sparingly. Persistent masking is ideal for cases where all users should not see the original sensitive data (e.g. for test / development environments) and field level data protection is required.
Dynamic data masking de-identifies data when it is accessed. The original data is still stored in the database. Dynamic data masking (DDM) acts as a proxy between the application and database and rewrites the user / application request against the database depending on whether the user has the privilege to view the data or not. If the requested data is not sensitive or the user is a privileged user who has the permission to access the sensitive data, then the DDM proxy passes the request to the database without modification, and the result set is returned to the user in the clear. If the data is sensitive and the user does not have the privilege to view the data, then the DDM proxy rewrites the request to include a masking function and passes the request to the database to execute. The result is returned to the user with the sensitive data masked. Dynamic data masking is ideal for protecting sensitive fields in production systems where application changes are difficult or disruptive to implement and performance / response time is of high importance.
Tokenization substitutes a sensitive data element with a non-sensitive data element or token. The first generation tokenization system requires a token server and a database to store the original sensitive data. The mapping from the clear text to the token makes it very difficult to reverse the token back to the original data without the token system. The existence of a token server and database storing the original sensitive data renders the token server and mapping database as a potential point of security vulnerability, bottleneck for scalability, and single point of failure. Next generation tokenization systems have addressed these weaknesses. However, tokenization does require changes to the application layer to tokenize and detokenize when the sensitive data is accessed. Tokenization can be used in production systems to protect sensitive data at rest in the database store, when changes to the application layer can be made relatively easily to perform the tokenization / detokenization operations.
Retention management and purging is more of a data management method to ensure that data is retained only as long as necessary. The best method of reducing data privacy risk is to eliminate the sensitive data. Therefore, appropriate retention, archiving, and purging policies should be applied to reduce the privacy and legal risks of holding on to sensitive data for too long. Retention management and purging is a data management best practices that should always be put to use.
CIOs and CFOs both dig data security
In my discussions with CIOs over the last couple of months, I asked them about the importance of a series of topics. All of them placed data security at the top of their IT priority list. Even their CFO counterparts, with whom they do not always see eye to eye, said they were very concerned about the business risk for corporate data. These CFOs said that they touch, as a part of owning business risk, security — especially from hacking. One CFO said that he worried, as well, about the impact of data security for compliance issues, including HIPAA and SOX. Another said this: “The security of data is becoming more and more important. The auditors are going after this. CFOs, for this reason, are really worried about getting hacked. This is a whole new direction, but some of the highly publicized recent hacks have scared a lot of folks and they combined represent to many of us a watershed event.”
According to David W. Owens the editor of CFO Magazine, even if you are using “secure” storage, such as internal drives and private clouds, the access to these areas can be anything but secure. Practically any employee can be carrying around sensitive financial and performance data in his or her pocket, at any time.” Obviously, new forms of data access have created new forms of data risk.
Are some retailers really leaving the keys in the ignition?
Given the like mind set from CIOs and CFOs, I was shocked to learn that some of the recently hacked retailers had been using outdated security software, which may have given hackers easier access company payment data systems. Most amazingly, some retailers had not even encrypted their customer payment data. Because of this, hackers were able to hide on the network for months and steal payment data, as customers continued to use their credit cards at the company’s point of sale locations.
Why weren’t these transactions encrypted or masked? In my 1998 financial information start-up, we encrypted our databases to protect against hacks of our customers’ personal financial data. One answer came from a discussion with a Fortune 100 Insurance CIO. This CIO said “CIO’s/CTO’s/CISO’s struggle with selling the value of these investment because the C Suite is only interested in hearing about investments with a direct impact on business outcomes and benefits”.
Enterprise security drives enterprise brand today
So how should leaders better argue the business case for security investments? I want to suggest that the value of IT is its “brand promise”. For retailers, in particular, if a past purchase decision creates a perceived personal data security risk, IT becomes a liability to their corporations brand equity and potentially creates a negative impact on future sales. Increasingly how these factors are managed either supports or not the value of a company’s brand.
My message is this: Spend whatever it takes to protect your brand equity; Otherwise a security issue will become a revenue issue.
In sum, this means organizations that want to differentiate themselves and avoid becoming a brand liability need to further invest in their data centric security strategy and of course, encryption. The game is no longer just about securing particular applications. IT organizations need to take a data centric approach to securing customer data and other types of enterprise data. Enterprise level data governance rules needs to be a requirement. A data centric approach can mitigate business risk by helping organizations to understand where sensitive data is and to protect it in motion and at rest.
Solutions: Enterprise Level Data Security
The State of Data Centric Security
How Is The CIO Role Starting To Change?
The CFO viewpoint on data
CFOs discuss their technology priorities
The information security industry is reporting that more than 1.5 billion (yes, that’s with a “B”) emails and passwords have been hacked. It’s hard to tell from the article, but this could be the big one. (And just when we thought that James Bond had taken care of the Russian mafia.) From both large and small companies, nobody is safe. According to the experts the sites ranged from small e-commerce sites to Fortune 500 companies. At this time the experts aren’t telling us who the big targets were. We could be very unpleasantly surprised.
Most security experts admit that the bulk of the post-breach activity will be email spamming. Insidious to be sure. But imagine if the hackers were to get a little more intelligent about what they have. How many individuals reuse passwords? Experts say over 90% of consumers reuse passwords between popular sites. And since email addresses are the most universally used “user name” on those sites, the chance of that 1.5 billion identities translating into millions of pirated activities is fairly high.
According to the recent published Ponemon study; 24% of respondents don’t know where their sensitive data is stored. That is a staggering amount. Further complicating the issue, the same study notes that 65% of the respondents have no comprehensive data forensics capability. That means that consumers are more than likely to never hear from their provider that their data had been breached. Until it is too late.
So now I guess we all get to go change our passwords again. And we don’t know why, we just have to. This is annoying. But it’s not a permanent fix to have consumers constantly looking over their virtual shoulders. Let’s talk about the enterprise sized firms first. Ponemon indicates that 57% of respondents would like more trained data security personnel to protect data. And the enterprise firm should have the resources to task IT personnel to protect data. They also have the ability to license best in class technology to protect data. There is no excuse not to implement an enterprise data masking technology. This should be used hand in hand with network intrusion defenses to protect from end to end.
Smaller enterprises have similar options. The same data masking technology can be leveraged on smaller scale by a smaller IT organization including the personnel to optimize the infrastructure. Additionally, most small enterprises leverage Cloud based systems that should have the same defenses in place. The small enterprise should bias their buying criteria in data systems for those that implement data masking technology.
Let me add a little fuel to the fire and talk about a different kind of cost. Insurers cover Cyber Risk either as part of a Commercial General Liability policy or as a separate policy. In 2013, insurers paid an average approaching $3.5M for each cyber breach claim. The average per record cost of claims was over $6,000. Now, imagine your enterprise’s slice of those 1.5 billion records. Obviously these are claims, not premiums. Premiums can range up to $40,000 per year for each $1M in coverage. Insurers will typically give discounts for those companies that have demonstrated security practices and infrastructure. I won’t belabor the point, it’s pure math at this point.
There is plenty of risk and cost to go around, to be sure. But there is a way to stay protected with Informatica. And now, let’s all take a few minutes to go change our passwords. I’ll wait right here. There, do you feel better?
For more information on Informatica’s data masking technology click here, where you can drill into dynamic and persistent data masking technology, leading in the industry. So you should still change your passwords…but check out the industry’s leading data security technology after you do.
I recently met with a longtime colleague from the Oracle E-Business Suite implementation eco-system, now VP of IT for a global technology provider. This individual has successfully implemented data archiving and data masking technologies to eliminate duplicate applications and control the costs of data growth – saving tens of millions of dollars. He has freed up resources that were re-deployed within new innovative projects such as Big Data – giving him the reputation as a thought leader. In addition, he has avoided exposing sensitive data in application development activities by securing it with data masking technology – thus securing his reputation.
When I asked him about those projects and the impact on his career, he responded, ‘Data archiving and data security are table stakes in the Oracle Applications IT game. However, if I want to be a part of anything important, it has to involve Cloud and Big Data.’ He further explained how the savings achieved from Informatica Data Archive enabled him to increase employee retention rates because he was able to fund an exciting Hadoop project that key resources wanted to work on. Not to mention, as he transitioned from physical infrastructure to a virtual server by retiring legacy applications – he had accomplished his first step on his ‘journey to the cloud’. This would not have been possible if his data required technology that was not supported in the cloud. If he hadn’t secured sensitive data and had experienced a breach, he would be looking for a new job in a new industry.
Not long after, I attended a CIO summit where the theme of the conference was ‘Breakthrough Innovation’. Of course, Cloud and Big Data were main stage topics – not just about the technology, but about how it was used to solve business challenges and provide services to the new generation of ‘entitled’ consumers. This is the description of those who expect to have everything at their fingertips. They want to be empowered to share or not share their information. They expect that if you are going to save their personal information, it will not be abused. Lastly, they may even expect to try a product or service for free before committing to buy.
In order to size up to these expectations, Application Owners, like my long-time colleague, need to incorporate Data Archive and Data Masking in their standard SDLC processes. Without Data Archive, IT budgets may be consumed by supporting old applications and mountains of data, thereby becoming inaccessible for new innovative projects. Without Data Masking, a public breach will drive many consumers elsewhere.
- The RSA conference took place in San Francisco from February 24-28, 2014
- The IAPP Global Privacy Summit took place Washington, DC from March 5-7, 2014
Data Privacy at the 2014 RSA Conference
The RSA conference was busy as expected, with over 30,000 attendees. Informatica co-sponsored an after-hours event with one of our partners, Imperva, at the Dark Circus. The event was standing room only and provided a great escape from the torrential rain. One highlight of RSA, for Informatica, is that we were honored with two of the 2014 Security Products Guide Awards:
- Informatica Dynamic Data Masking won the Gold Award for Database Security, Data Leakage Prevention/Extrusion Prevention
- Informatica Cloud Test Data Management and Security won the Bronze Award for New Products
Of particular interest to us was the growing recognition of data-centric security and privacy at RSA. I briefly met Bob Rudis, co-author of “Data Driven Security” which was featured at the onsite bookstore. In the book, Rudis has presented a great case for focusing on data as the center-point of security, through data analysis and visualization. From Informatica’s perspective, we also believe that a deep understanding of data and its relationships will escalate as a key driver of security policies and measures.
Data Privacy at the IAPP Global Privacy Summit
The IAPP Global Privacy Summit was an amazing event, small (2,500), but completely sold-out and overflowing its current venue. We exhibited and had the opportunity to meet CPOs, privacy, risk/compliance and security professionals from around the world, and had hundreds of conversations about the role of data discovery and masking for privacy. From the privacy perspective, it is all about finding, de-identification and protection of PII, PCI and PHI. These privacy professionals have extensive legal and/or data security backgrounds and understand the need to safeguard privacy by using data masking. Many notable themes were present at IAPP:
- De-identification is a key topic area
- Concerns about outsourcing and contractors in application development and testing have driven test data management adoption
- No national US privacy regulations expected in the short-term
- Europe has active but uneven privacy enforcement (France: “name and shame”, UK: heavy fines, Spain; most active)
If you want to learn more about data privacy and security, you will find no better place than Informatica World 2014. There, you’ll learn about the latest data security trends, see updates to Informatica’s data privacy and security offerings, and find out how Informatica protects sensitive information in real time without requiring costly, time-consuming changes to applications and databases. Register TODAY!
In the first two issues I spent time looking at the need for states to pay attention to the digital health and safety of their citizens, followed by the oft forgotten need to understand and protect the non-production data. This is data than has often proliferated and also ignored or forgotten about.
In many ways, non-production data is simpler to protect. Development and test systems can usually work effectively with realistic but not real PII data and realistic but not real volumes of data. On the other hand, production systems need the real production data complete with the wealth of information that enables individuals to be identified – and therefore presents a huge risk. If and when that data is compromised either deliberately or accidentally the consequences can be enormous; in the impact on the individual citizens and also the cost of remediation on the state. Many will remember the massive South Carolina data breach of late 2012 when over the course of 2 days a 74 GB database was downloaded and stolen, around 3.8 million payers and 1.9 million dependents had their social security information stolen and 3.3 million “lost” bank account details. The citizens’ pain didn’t end there, as the company South Carolina picked to help its citizens seems to have tried to exploit the situation.
The biggest problem with securing production data is that there are numerous legitimate users and uses of that data, and most often just a small number of potentially malicious or accidental attempts of inappropriate or dangerous access. So the question is… how does a state agency protect its citizens’ sensitive data while at the same time ensuring that legitimate uses and users continues – without performance impacts or any disruption of access? Obviously each state needs to make its own determination as to what approach works best for them.
This video does a good job at explaining the scope of the overall data privacy/security problems and also reviews a number of successful approaches to protecting sensitive data in both production and non-production environments. What you’ll find is that database encryption is just the start and is fine if the database is “stolen” (unless of course the key is stolen along with the data! Encryption locks the data away in the same way that a safe protects physical assets – but the same problem exists. If the key is stolen with the safe then all bets are off. Legitimate users are usually easily able deliberately breach and steal the sensitive contents, and it’s these latter occasions we need to understand and protect against. Given that the majority of data breaches are “inside jobs” we need to ensure that authorized users (end-users, DBAs, system administrators and so on) that have legitimate access only have access to the data they absolutely need, no more and no less.
So we have reached the end of the first series. In the first blog we looked at the need for states to place the same emphasis on the digital health and welfare of their citizens as they do on their physical and mental health. In the second we looked at the oft-forgotten area of non-production (development, testing, QA etc.) data. In this third and final piece we looked at the need to and some options for providing the complete protection of non-production data.
In my first article on the topic of citizens’ digital health and safety we looked at the states’ desire to keep their citizens healthy and safe and also at the various laws and regulations they have in place around data breaches and losses. The size and scale of the problem together with some ideas for effective risk mitigation are in this whitepaper.
Let’s now start delving a little deeper into the situation states are faced with. It’s pretty obvious that citizen data that enables an individual to be identified (PII) needs to be protected. We immediately think of the production data: data that is used in integrated eligibility systems; in health insurance exchanges; in data warehouses and so on. In some ways the production data is the least of our problems; our research shows that the average state has around 10 to 12 full copies of data for non-production (development, test, user acceptance and so on) purposes. This data tends to be much more vulnerable because it is widespread and used by a wide variety of people – often subcontractors or outsourcers, and often the content of the data is not well understood.
Obviously production systems need access to real production data (I’ll cover how best to protect that in the next issue), on the other hand non-production systems of every sort do not. Non-production systems most often need realistic, but not real data and realistic, but not real data volumes (except maybe for the performance/stress/throughput testing system). What need to be done? Well to start with, a three point risk remediation plan would be a good place to start.
1. Understand the non-production data using sophisticated data and schema profiling combined with NLP (Natural Language Processing) techniques help to identify previously unrealized PII that needs protecting.
2. Permanently mask the PII so that it is no longer the real data but is realistic enough for non-production uses and make sure that the same masking is applied to the attribute values wherever they appear in multiple tables/files.
3. Subset the data to reduce data volumes, this limits the size of the risk and also has positive effects on performance, run-times, backups etc.
Gartner has just published their 2013 magic quadrant for data masking this covers both what they call static (i.e. permanent or persistent masking) and dynamic (more on this in the next issue) masking. As usual the MQ gives a good overview of the issues behind the technology as well as a review of the position, strengths and weaknesses of the leading vendors.
It is (or at least should be) an imperative that from the top down state governments realize the importance and vulnerability of their citizens data and put in place a non-partisan plan to prevent any future breaches. As the reader might imagine, for any such plan to success needs a combination of cultural and organizational change (getting people to care) and putting the right technology – together these will greatly reduce the risk. In the next and final issue on this topic we will look at the vulnerabilities of production data, and what can be done to dramatically increase its privacy and security.