Tag Archives: data masking
Which Method of Controls Should You Use to Protect Sensitive Data in Databases and Enterprise Applications? Part II
- Do you need to protect data at rest (in storage), during transmission, and/or when accessed?
- Do some privileged users still need the ability to view the original sensitive data or does sensitive data need to be obfuscated at all levels?
- What is the granularity of controls that you need?
- Datafile level
- Table level
- Row level
- Field / column level
- Cell level
- Do you need to be able to control viewing vs. modification of sensitive data?
- Do you need to maintain the original characteristics / format of the data (e.g. for testing, demo, development purposes)?
- Is response time latency / performance of high importance for the application? This can be the case for mission critical production applications that need to maintain response times in the order of seconds or sub-seconds.
In order to help you determine which method of control is appropriate for your requirements, the following table provides a comparison of the different methods and their characteristics.
A combination of protection method may be appropriate based on your requirements. For example, to protect data in non-production environments, you may want to use persistent data masking to ensure that no one has access to the original production data, since they don’t need to. This is especially true if your development and testing is outsourced to third parties. In addition, persistent data masking allows you to maintain the original characteristics of the data to ensure test data quality.
In production environments, you may want to use a combination of encryption and dynamic data masking. This is the case if you would like to ensure that all data at rest is protected against unauthorized users, yet you need to protect sensitive fields only for certain sets of authorized or privileged users, but the rest of your users should be able to view the data in the clear.
The best method or combination of methods will depend on each scenario and set of requirements for your environment and organization. As with any technology and solution, there is no one size fits all.
Which Method of Controls Should You Use to Protect Sensitive Data in Databases and Enterprise Applications? Part I
- Which types of data should be protected?
- Which data should be classified as “sensitive?”
- Where is this sensitive data located?
- Which groups of users should have access to this data?
Because these questions come up frequently, it seems ideal to share a few guidelines on this topic.
When protecting the confidentiality and integrity of data, the first level of defense is Authentication and access control. However, data with higher levels of sensitivity or confidentiality may require additional levels of protection, beyond regular authentication and authorization methods.
There are a number of control methods for securing sensitive data available in the market today, including:
- Persistent (Static) Data Masking
- Dynamic Data Masking
- Retention management and purging
Encryption is a cryptographic method of encoding data. There are generally, two methods of encryption: symmetric (using single secret key) and asymmetric (using public and private keys). Although there are methods of deciphering encrypted information without possessing the key, a good encryption algorithm makes it very difficult to decode the encrypted data without knowledge of the key. Key management is usually a key concern with this method of control. Encryption is ideal for mass protection of data (e.g. an entire data file, table, partition, etc.) against unauthorized users.
Persistent or static data masking obfuscates data at rest in storage. There is usually no way to retrieve the original data – the data is permanently masked. There are multiple techniques for masking data, including: shuffling, substitution, aging, encryption, domain-specific masking (e.g. email address, IP address, credit card, etc.), dictionary lookup, randomization, etc. Depending on the technique, there may be ways to perform reverse masking - this should be used sparingly. Persistent masking is ideal for cases where all users should not see the original sensitive data (e.g. for test / development environments) and field level data protection is required.
Dynamic data masking de-identifies data when it is accessed. The original data is still stored in the database. Dynamic data masking (DDM) acts as a proxy between the application and database and rewrites the user / application request against the database depending on whether the user has the privilege to view the data or not. If the requested data is not sensitive or the user is a privileged user who has the permission to access the sensitive data, then the DDM proxy passes the request to the database without modification, and the result set is returned to the user in the clear. If the data is sensitive and the user does not have the privilege to view the data, then the DDM proxy rewrites the request to include a masking function and passes the request to the database to execute. The result is returned to the user with the sensitive data masked. Dynamic data masking is ideal for protecting sensitive fields in production systems where application changes are difficult or disruptive to implement and performance / response time is of high importance.
Tokenization substitutes a sensitive data element with a non-sensitive data element or token. The first generation tokenization system requires a token server and a database to store the original sensitive data. The mapping from the clear text to the token makes it very difficult to reverse the token back to the original data without the token system. The existence of a token server and database storing the original sensitive data renders the token server and mapping database as a potential point of security vulnerability, bottleneck for scalability, and single point of failure. Next generation tokenization systems have addressed these weaknesses. However, tokenization does require changes to the application layer to tokenize and detokenize when the sensitive data is accessed. Tokenization can be used in production systems to protect sensitive data at rest in the database store, when changes to the application layer can be made relatively easily to perform the tokenization / detokenization operations.
Retention management and purging is more of a data management method to ensure that data is retained only as long as necessary. The best method of reducing data privacy risk is to eliminate the sensitive data. Therefore, appropriate retention, archiving, and purging policies should be applied to reduce the privacy and legal risks of holding on to sensitive data for too long. Retention management and purging is a data management best practices that should always be put to use.
CIOs and CFOs both dig data security
In my discussions with CIOs over the last couple of months, I asked them about the importance of a series of topics. All of them placed data security at the top of their IT priority list. Even their CFO counterparts, with whom they do not always see eye to eye, said they were very concerned about the business risk for corporate data. These CFOs said that they touch, as a part of owning business risk, security — especially from hacking. One CFO said that he worried, as well, about the impact of data security for compliance issues, including HIPAA and SOX. Another said this: “The security of data is becoming more and more important. The auditors are going after this. CFOs, for this reason, are really worried about getting hacked. This is a whole new direction, but some of the highly publicized recent hacks have scared a lot of folks and they combined represent to many of us a watershed event.”
According to David W. Owens the editor of CFO Magazine, even if you are using “secure” storage, such as internal drives and private clouds, the access to these areas can be anything but secure. Practically any employee can be carrying around sensitive financial and performance data in his or her pocket, at any time.” Obviously, new forms of data access have created new forms of data risk.
Are some retailers really leaving the keys in the ignition?
Given the like mind set from CIOs and CFOs, I was shocked to learn that some of the recently hacked retailers had been using outdated security software, which may have given hackers easier access company payment data systems. Most amazingly, some retailers had not even encrypted their customer payment data. Because of this, hackers were able to hide on the network for months and steal payment data, as customers continued to use their credit cards at the company’s point of sale locations.
Why weren’t these transactions encrypted or masked? In my 1998 financial information start-up, we encrypted our databases to protect against hacks of our customers’ personal financial data. One answer came from a discussion with a Fortune 100 Insurance CIO. This CIO said “CIO’s/CTO’s/CISO’s struggle with selling the value of these investment because the C Suite is only interested in hearing about investments with a direct impact on business outcomes and benefits”.
Enterprise security drives enterprise brand today
So how should leaders better argue the business case for security investments? I want to suggest that the value of IT is its “brand promise”. For retailers, in particular, if a past purchase decision creates a perceived personal data security risk, IT becomes a liability to their corporations brand equity and potentially creates a negative impact on future sales. Increasingly how these factors are managed either supports or not the value of a company’s brand.
My message is this: Spend whatever it takes to protect your brand equity; Otherwise a security issue will become a revenue issue.
In sum, this means organizations that want to differentiate themselves and avoid becoming a brand liability need to further invest in their data centric security strategy and of course, encryption. The game is no longer just about securing particular applications. IT organizations need to take a data centric approach to securing customer data and other types of enterprise data. Enterprise level data governance rules needs to be a requirement. A data centric approach can mitigate business risk by helping organizations to understand where sensitive data is and to protect it in motion and at rest.
Solutions: Enterprise Level Data Security
The State of Data Centric Security
How Is The CIO Role Starting To Change?
The CFO viewpoint on data
CFOs discuss their technology priorities
The information security industry is reporting that more than 1.5 billion (yes, that’s with a “B”) emails and passwords have been hacked. It’s hard to tell from the article, but this could be the big one. (And just when we thought that James Bond had taken care of the Russian mafia.) From both large and small companies, nobody is safe. According to the experts the sites ranged from small e-commerce sites to Fortune 500 companies. At this time the experts aren’t telling us who the big targets were. We could be very unpleasantly surprised.
Most security experts admit that the bulk of the post-breach activity will be email spamming. Insidious to be sure. But imagine if the hackers were to get a little more intelligent about what they have. How many individuals reuse passwords? Experts say over 90% of consumers reuse passwords between popular sites. And since email addresses are the most universally used “user name” on those sites, the chance of that 1.5 billion identities translating into millions of pirated activities is fairly high.
According to the recent published Ponemon study; 24% of respondents don’t know where their sensitive data is stored. That is a staggering amount. Further complicating the issue, the same study notes that 65% of the respondents have no comprehensive data forensics capability. That means that consumers are more than likely to never hear from their provider that their data had been breached. Until it is too late.
So now I guess we all get to go change our passwords again. And we don’t know why, we just have to. This is annoying. But it’s not a permanent fix to have consumers constantly looking over their virtual shoulders. Let’s talk about the enterprise sized firms first. Ponemon indicates that 57% of respondents would like more trained data security personnel to protect data. And the enterprise firm should have the resources to task IT personnel to protect data. They also have the ability to license best in class technology to protect data. There is no excuse not to implement an enterprise data masking technology. This should be used hand in hand with network intrusion defenses to protect from end to end.
Smaller enterprises have similar options. The same data masking technology can be leveraged on smaller scale by a smaller IT organization including the personnel to optimize the infrastructure. Additionally, most small enterprises leverage Cloud based systems that should have the same defenses in place. The small enterprise should bias their buying criteria in data systems for those that implement data masking technology.
Let me add a little fuel to the fire and talk about a different kind of cost. Insurers cover Cyber Risk either as part of a Commercial General Liability policy or as a separate policy. In 2013, insurers paid an average approaching $3.5M for each cyber breach claim. The average per record cost of claims was over $6,000. Now, imagine your enterprise’s slice of those 1.5 billion records. Obviously these are claims, not premiums. Premiums can range up to $40,000 per year for each $1M in coverage. Insurers will typically give discounts for those companies that have demonstrated security practices and infrastructure. I won’t belabor the point, it’s pure math at this point.
There is plenty of risk and cost to go around, to be sure. But there is a way to stay protected with Informatica. And now, let’s all take a few minutes to go change our passwords. I’ll wait right here. There, do you feel better?
For more information on Informatica’s data masking technology click here, where you can drill into dynamic and persistent data masking technology, leading in the industry. So you should still change your passwords…but check out the industry’s leading data security technology after you do.
I recently met with a longtime colleague from the Oracle E-Business Suite implementation eco-system, now VP of IT for a global technology provider. This individual has successfully implemented data archiving and data masking technologies to eliminate duplicate applications and control the costs of data growth – saving tens of millions of dollars. He has freed up resources that were re-deployed within new innovative projects such as Big Data – giving him the reputation as a thought leader. In addition, he has avoided exposing sensitive data in application development activities by securing it with data masking technology – thus securing his reputation.
When I asked him about those projects and the impact on his career, he responded, ‘Data archiving and data security are table stakes in the Oracle Applications IT game. However, if I want to be a part of anything important, it has to involve Cloud and Big Data.’ He further explained how the savings achieved from Informatica Data Archive enabled him to increase employee retention rates because he was able to fund an exciting Hadoop project that key resources wanted to work on. Not to mention, as he transitioned from physical infrastructure to a virtual server by retiring legacy applications – he had accomplished his first step on his ‘journey to the cloud’. This would not have been possible if his data required technology that was not supported in the cloud. If he hadn’t secured sensitive data and had experienced a breach, he would be looking for a new job in a new industry.
Not long after, I attended a CIO summit where the theme of the conference was ‘Breakthrough Innovation’. Of course, Cloud and Big Data were main stage topics – not just about the technology, but about how it was used to solve business challenges and provide services to the new generation of ‘entitled’ consumers. This is the description of those who expect to have everything at their fingertips. They want to be empowered to share or not share their information. They expect that if you are going to save their personal information, it will not be abused. Lastly, they may even expect to try a product or service for free before committing to buy.
In order to size up to these expectations, Application Owners, like my long-time colleague, need to incorporate Data Archive and Data Masking in their standard SDLC processes. Without Data Archive, IT budgets may be consumed by supporting old applications and mountains of data, thereby becoming inaccessible for new innovative projects. Without Data Masking, a public breach will drive many consumers elsewhere.
- The RSA conference took place in San Francisco from February 24-28, 2014
- The IAPP Global Privacy Summit took place Washington, DC from March 5-7, 2014
Data Privacy at the 2014 RSA Conference
The RSA conference was busy as expected, with over 30,000 attendees. Informatica co-sponsored an after-hours event with one of our partners, Imperva, at the Dark Circus. The event was standing room only and provided a great escape from the torrential rain. One highlight of RSA, for Informatica, is that we were honored with two of the 2014 Security Products Guide Awards:
- Informatica Dynamic Data Masking won the Gold Award for Database Security, Data Leakage Prevention/Extrusion Prevention
- Informatica Cloud Test Data Management and Security won the Bronze Award for New Products
Of particular interest to us was the growing recognition of data-centric security and privacy at RSA. I briefly met Bob Rudis, co-author of “Data Driven Security” which was featured at the onsite bookstore. In the book, Rudis has presented a great case for focusing on data as the center-point of security, through data analysis and visualization. From Informatica’s perspective, we also believe that a deep understanding of data and its relationships will escalate as a key driver of security policies and measures.
Data Privacy at the IAPP Global Privacy Summit
The IAPP Global Privacy Summit was an amazing event, small (2,500), but completely sold-out and overflowing its current venue. We exhibited and had the opportunity to meet CPOs, privacy, risk/compliance and security professionals from around the world, and had hundreds of conversations about the role of data discovery and masking for privacy. From the privacy perspective, it is all about finding, de-identification and protection of PII, PCI and PHI. These privacy professionals have extensive legal and/or data security backgrounds and understand the need to safeguard privacy by using data masking. Many notable themes were present at IAPP:
- De-identification is a key topic area
- Concerns about outsourcing and contractors in application development and testing have driven test data management adoption
- No national US privacy regulations expected in the short-term
- Europe has active but uneven privacy enforcement (France: “name and shame”, UK: heavy fines, Spain; most active)
If you want to learn more about data privacy and security, you will find no better place than Informatica World 2014. There, you’ll learn about the latest data security trends, see updates to Informatica’s data privacy and security offerings, and find out how Informatica protects sensitive information in real time without requiring costly, time-consuming changes to applications and databases. Register TODAY!
In the first two issues I spent time looking at the need for states to pay attention to the digital health and safety of their citizens, followed by the oft forgotten need to understand and protect the non-production data. This is data than has often proliferated and also ignored or forgotten about.
In many ways, non-production data is simpler to protect. Development and test systems can usually work effectively with realistic but not real PII data and realistic but not real volumes of data. On the other hand, production systems need the real production data complete with the wealth of information that enables individuals to be identified – and therefore presents a huge risk. If and when that data is compromised either deliberately or accidentally the consequences can be enormous; in the impact on the individual citizens and also the cost of remediation on the state. Many will remember the massive South Carolina data breach of late 2012 when over the course of 2 days a 74 GB database was downloaded and stolen, around 3.8 million payers and 1.9 million dependents had their social security information stolen and 3.3 million “lost” bank account details. The citizens’ pain didn’t end there, as the company South Carolina picked to help its citizens seems to have tried to exploit the situation.
The biggest problem with securing production data is that there are numerous legitimate users and uses of that data, and most often just a small number of potentially malicious or accidental attempts of inappropriate or dangerous access. So the question is… how does a state agency protect its citizens’ sensitive data while at the same time ensuring that legitimate uses and users continues – without performance impacts or any disruption of access? Obviously each state needs to make its own determination as to what approach works best for them.
This video does a good job at explaining the scope of the overall data privacy/security problems and also reviews a number of successful approaches to protecting sensitive data in both production and non-production environments. What you’ll find is that database encryption is just the start and is fine if the database is “stolen” (unless of course the key is stolen along with the data! Encryption locks the data away in the same way that a safe protects physical assets – but the same problem exists. If the key is stolen with the safe then all bets are off. Legitimate users are usually easily able deliberately breach and steal the sensitive contents, and it’s these latter occasions we need to understand and protect against. Given that the majority of data breaches are “inside jobs” we need to ensure that authorized users (end-users, DBAs, system administrators and so on) that have legitimate access only have access to the data they absolutely need, no more and no less.
So we have reached the end of the first series. In the first blog we looked at the need for states to place the same emphasis on the digital health and welfare of their citizens as they do on their physical and mental health. In the second we looked at the oft-forgotten area of non-production (development, testing, QA etc.) data. In this third and final piece we looked at the need to and some options for providing the complete protection of non-production data.
In my first article on the topic of citizens’ digital health and safety we looked at the states’ desire to keep their citizens healthy and safe and also at the various laws and regulations they have in place around data breaches and losses. The size and scale of the problem together with some ideas for effective risk mitigation are in this whitepaper.
Let’s now start delving a little deeper into the situation states are faced with. It’s pretty obvious that citizen data that enables an individual to be identified (PII) needs to be protected. We immediately think of the production data: data that is used in integrated eligibility systems; in health insurance exchanges; in data warehouses and so on. In some ways the production data is the least of our problems; our research shows that the average state has around 10 to 12 full copies of data for non-production (development, test, user acceptance and so on) purposes. This data tends to be much more vulnerable because it is widespread and used by a wide variety of people – often subcontractors or outsourcers, and often the content of the data is not well understood.
Obviously production systems need access to real production data (I’ll cover how best to protect that in the next issue), on the other hand non-production systems of every sort do not. Non-production systems most often need realistic, but not real data and realistic, but not real data volumes (except maybe for the performance/stress/throughput testing system). What need to be done? Well to start with, a three point risk remediation plan would be a good place to start.
1. Understand the non-production data using sophisticated data and schema profiling combined with NLP (Natural Language Processing) techniques help to identify previously unrealized PII that needs protecting.
2. Permanently mask the PII so that it is no longer the real data but is realistic enough for non-production uses and make sure that the same masking is applied to the attribute values wherever they appear in multiple tables/files.
3. Subset the data to reduce data volumes, this limits the size of the risk and also has positive effects on performance, run-times, backups etc.
Gartner has just published their 2013 magic quadrant for data masking this covers both what they call static (i.e. permanent or persistent masking) and dynamic (more on this in the next issue) masking. As usual the MQ gives a good overview of the issues behind the technology as well as a review of the position, strengths and weaknesses of the leading vendors.
It is (or at least should be) an imperative that from the top down state governments realize the importance and vulnerability of their citizens data and put in place a non-partisan plan to prevent any future breaches. As the reader might imagine, for any such plan to success needs a combination of cultural and organizational change (getting people to care) and putting the right technology – together these will greatly reduce the risk. In the next and final issue on this topic we will look at the vulnerabilities of production data, and what can be done to dramatically increase its privacy and security.
Informatica announced, once again, that it is listed as a leader in the industry’s second Gartner Magic Quadrant for Data Masking Technology. With data security continuing to grow as one of the fastest segments in the enterprise software market, technologies such as data masking are becoming the solution of choice for data-centric security.
Increased fear of cyber-attacks and internal data breaches has made predictions that 2014 is the year of preventative and tactical measures to ensure corporate data assets are safe. Data masking should be included in those measures. According to Gartner,
“Security program managers need to take a strategic approach with tactical best-practice technology configurations in order to properly address the most common advanced targeted attack scenarios to increase both detection and prevention capabilities.”
Without these measures, the cost of an attack or breach is growing every year. The Ponemon Institute posted in a recent study:
“The 2013 Cost of Cyber Crime Study states that the average annualized cost of cybercrime incurred by a benchmark sample of US organizations was $11.56 million, nearly 78% more than the cost estimated in the first analysis conducted 4 years ago.”
Informatica believes that the best preventative measures include a layered approach for data security but without sacrificing agility or adding unnecessary costs. Data Masking delivers data-centric security with improved productivity and reduced overall costs.
Data Masking prevents internal data theft and abuse of sensitive data by hiding it from users. Data masking techniques include replacing some fields with similar-looking characters, masking characters (for example, “x”), substituting real last names with fictional last names and shuffling data within columns – to name a few. Other terms for data masking include data obfuscation, sanitization, scrambling, de-identification, and anonymization . Call it what you like, but without it – organizations may continue to expose sensitive data to those with mal intentions.
To learn more, Download the Gartner Magic Quadrant Data Masking Report now. And visit the Informatica website for data masking product information.
About the Magic Quadrant
Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings. Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose
A data integration hub is a proven vehicle to provide a self service model for publishing and subscribing data to be made available to a variety of users. For those who deploy these environments for regulated and sensitive data need to think of data privacy and data governance during the design phase of the project.
In the data integration hub architecture, think about how sensitive data will be coming from different locations, from a variety of technology platforms, and certainly from systems being managed by teams with a wide range of data security skills. How can you ensure data will be protected across such a heterogeneous environment? Not to mention if data traverses across national boundaries.
Then think about testing connectivity. If data needs to be validated in a data quality rules engine, in order to truly test this connectivity, there needs to be a capability to test using valid data. However testers should not have access or visibility into the actual data itself if it is classified as sensitive or confidential.
With a hub and spoke model, the rules are difficult to enforce if data is being requested from one country and received in another. The opportunity for exposing human error and potential data leakage increases exponentially. Rather than reading about a breach in the headlines, it may make sense to look at building preventative measures or spending the time and money to do the right thing from the onset of the project.
There are technologies that exist in the market that are easy to implement that are designed to prevent this very type of exposure. This technology is called data masking which includes data obfuscation, encryption and tokenization. Informatica’s Data Privacy solution based on persistent and dynamic data masking options can be easily and quickly deployed without the need to develop code or modify the source or target application.
When developing your reference architecture for a data integration hub, incorporate sound data governance policies and build data privacy into the application upfront. Don’t wait for the headlines to include your company and someone’s personal data.