Tag Archives: data masking
- PII – Personally Identifiable Information – any data that could potentially identify a specific individual. Any information that can be used to distinguish one person from another and can be used for de-anonymizing anonymous data can be considered PII
- GSA’s Rules of Behavior for Handling Personally Identifiable Information – This directive provides GSA’s policy on how to properly handle PII and the consequences and corrective actions that will be taken if a breach occurs
- PHI – Protected Health Information – any information about health status, provision of health care, or payment for health care that can be lined to a specific individual
- HIPAA Privacy Rule – The HIPAA Privacy Rule establishes national standards to protect individuals’ medical records and other personal health information and applies to health plans, health care clearinghouses, and those health care providers that conduct certain health care transactions electronically. The Rule requires appropriate safeguards to protect the privacy of personal health information, and sets limits and conditions on the uses and disclosures that may be made of such information without patient authorization. The Rule also gives patients rights over their health information, including rights to examine and obtain a copy of their health records, and to request corrections.
- Encryption – a method of protecting data by scrambling it into an unreadable form. It is a systematic encoding process which is only reversible with the right key.
- Tokenization – a method of replacing sensitive data with non-sensitive placeholder tokens. These tokens are swapped with data stored in relational databases and files.
- Data masking – a process that scrambles data, either an entire database or a subset. Unlike encryption, masking is not reversible; unlike tokenization, masked data is useful for limited purposes. There are several types of data masking:
- Static data masking (SDM) masks data in advance of using it. Non production databases masked NOT in real-time.
- Dynamic data masking (DDM) masks production data in real time
- Data Redaction – masks unstructured content (PDF, Word, Excel)
Each of the three methods for protecting data (encryption, tokenization and data masking) have different benefits and work to solve different security issues . We’ll address them in a bit. For a visual representation of the three methods – please see the table below:
For protecting PHI data – encryption is superior to tokenization. You encrypt different portions of personal healthcare data under different encryption keys. Only those with the requisite keys can see the data. This form of encryption requires advanced application support to manage the different data sets to be viewed or updated by different audiences. The key management service must be very scalable to handle even a modest community of users. Record management is particularly complicated. Encryption works better than tokenization for PHI – but it does not scale well.
Properly deployed, encryption is a perfectly suitable tool for protecting PII. It can be set up to protect archived data or data residing on file systems without modification to business processes.
- To protect the data, you must install encryption and key management services to protect the data – this only protects the data from access that circumvents applications
- You can add application layer encryption to protect data in use
- This requires changing applications and databases to support the additional protection
- You will pay the cost of modification and the performance of the application will be impacted
For tokenization of PHI – there are many pieces of data which must be bundled up in different ways for many different audiences. Using the tokenized data requires it to be de-tokenized (which usually includes a decryption process). This introduces an overhead to the process. A person’s medical history is a combination of medical attributes, doctor visits, outsourced visits. It is an entangled set of personal, financial, and medical data. Different groups need access to different subsets. Each audience needs a different slice of the data – but must not see the rest of it. You need to issue a different token for each and every audience. You will need a very sophisticated token management and tracking system to divide up the data, issuing and tracking different tokens for each audience.
Masking can scramble individual data columns in different ways so that the masked data looks like the original (retaining its format and data type) but it is no longer sensitive data. Masking is effective for maintaining aggregate values across an entire database, enabling preservation of sum and average values within a data set, while changing all the individual data elements. Masking plus encryption provide a powerful combination for distribution and sharing of medical information
Traditionally, data masking has been viewed as a technique for solving a test data problem. The December 2014 Gartner Magic Quadrant Report on Data Masking Technology extends the scope of data masking to more broadly include data de-identification in production, non-production, and analytic use cases. The challenge is to do this while retaining business value in the information for consumption and use.
Masked data should be realistic and quasi-real. It should satisfy the same business rules as real data. It is very common to use masked data in test and development environments as the data looks like “real” data, but doesn’t contain any sensitive information.
Let’s face it, building a Data Governance program is no overnight task. As one CDO puts it: ”data governance is a marathon, not a sprint”. Why? Because data governance is a complex business function that encompasses technology, people and process, all of which have to work together effectively to ensure the success of the initiative. Because of the scope of the program, Data Governance often calls for participants from different business units within an organization, and it can be disruptive at first.
Why bother then? Given that data governance is complex, disruptive, and could potentially introduce additional cost to a company? Well, the drivers for data governance can vary for different organizations. Let’s take a close look at some of the motivations behind data governance program.
For companies in heavily regulated industries, establishing a formal data governance program is a mandate. When a company is not compliant, consequences can be severe. Penalties could include hefty fines, brand damage, loss in revenue, and even potential jail time for the person who is held accountable for being noncompliance. In order to meet the on-going regulatory requirements, adhere to data security policies and standards, companies need to rely on clean, connected and trusted data to enable transparency, auditability in their reporting to meet mandatory requirements and answer critical questions from auditors. Without a dedicated data governance program in place, the compliance initiative could become an on-going nightmare for companies in the regulated industry.
A data governance program can also be established to support customer centricity initiative. To make effective cross-sells and ups-sells to your customers and grow your business, you need clear visibility into customer purchasing behaviors across multiple shopping channels and touch points. Customer’s shopping behaviors and their attributes are captured by the data, therefore, to gain thorough understanding of your customers and boost your sales, a holistic Data Governance program is essential.
Other reasons for companies to start a data governance program include improving efficiency and reducing operational cost, supporting better analytics and driving more innovations. As long as it’s a business critical area and data is at the core of the process, and the business case is loud and sound, then there is a compelling reason for launching a data governance program.
Now that we have identified the drivers for data governance, how do we start? This rather loaded question really gets into the details of the implementation. A few critical elements come to consideration including: identifying and establishing various task forces such as steering committee, data governance team and business sponsors; identifying roles and responsibilities for the stakeholders involved in the program; defining metrics for tracking the results. And soon you will find that on top of everything, communications, communications and more communications is probably the most important tactic of all for driving the initial success of the program.
A rule of thumb? Start small, take one-step at a time and focus on producing something tangible.
Sounds easy, right? Well, let’s hear what the real-world practitioners have to say. Join us at this Informatica webinar to hear Michael Wodzinski, Director of Information Architecture, Lisa Bemis, Director of Master Data, Fabian Torres, Director of Project Management from Houghton Mifflin Harcourt, global leader in publishing, as well as David Lyle, VP of product strategy from Informatica to discuss how to implement a successful data governance practice that brings business impact to an enterprise organization.
If you are currently kicking the tires on setting up data governance practice in your organization, I’d like to invite you to visit a member-only website dedicated to Data Governance: http://governyourdata.com/. This site currently has over 1,000 members and is designed to foster open communications on everything data governance. There you will find conversations on best practices, methodologies, frame works, tools and metrics. I would also encourage you to take a data governance maturity assessment to see where you currently stand on the data governance maturity curve, and compare the result against industry benchmark. More than 200 members have taken the assessment to gain better understanding of their current data governance program, so why not give it a shot?
Data Governance is a journey, likely a never-ending one. We wish you best of the luck on this effort and a joyful ride! We love to hear your stories.
I live in a very small town in Maine. I don’t spend a lot of time thinking about my privacy. Some would say that by living in a small town, you give up your right to privacy because everyone knows what everyone else is doing. Living here is a choice – for me to improve my family’s quality of life. Sharing all of the details of my life – not so much.
When I go to my doctor (who also happens to be a parent from my daughter’s school), I fully expect that any sort of information that I share with him, or that he obtains as a result of lab tests or interviews, or care that he provides is not available for anyone to view. On the flip side, I want researchers to be able to take my lab information combined with my health history in order to do research on the effectiveness of certain medications or treatment plans.
As a result of this dichotomy, Congress (in 1996) started to address governance regarding the transmission of this type of data. The Health Insurance Portability and Accountability Act of 1996 (HIPAA) is a Federal law that sets national standards for how health care plans, health care clearinghouses, and most health care providers protect the privacy of a patient’s health information. With certain exceptions, the Privacy Rule protects a subset of individually identifiable health information, known as protected health information or PHI, that is held or maintained by covered entities or their business associates acting for the covered entity. PHI is any information held by a covered entity which concerns health status, provision of health care, or payment for health care that can be linked to an individual.
Many payers have this type of data in their systems (perhaps in a Claims Administration system), and have the need to share data between organizational entities. Do you know if PHI data is being shared outside of the originating system? Do you know if PHI is available to resources that have no necessity to access this information? Do you know if PHI data is being shared outside your organization?
If you can answer yes to each of these questions – fantastic. You are well ahead of the curve. If not – you need to start considering solutions that can
- Identify PHI in all of your data streams
- Monitor and track the flow of this data throughout your organization and
- Mask this data if it is being shared with resources that don’t need to be able to identify the individual.
I want to researchers to have access to medically relevant data so they can find the cures to some horrific diseases. I want to feel comfortable sharing health information with my doctor. I want to feel comfortable that my health insurance company is respecting my privacy. Now to get my kids to stop oversharing.
In the report, Gartner cites. “Global-scale scandals around sensitive data losses have highlighted the need for effective data protection, especially from insider attacks. Data masking, which is focused on protecting data from insiders and outsiders, is a must-have technology in enterprises’ and governments’ security portfolios.”
Organizations realize that data protection must be hardened to protect against the inevitable breach; originating from either internal or external threats. Data masking covers gaps in data protection in production and non-production environments that can be exploited by attackers.
Informatica customers are elevating the importance of data security initiatives in 2015 given the high exposure of recent breaches and the shift from just stealing identities and intellectual property, to politically charged platforms. This raises the concern that existing security controls are insufficient and a more data-centric security approach is necessary.
Recent enforcement by the Federal Trade Commission in the US and emerging legislation worldwide has clearly indicated that sensitive data access and sharing should be tightly controlled; this is the strength of data masking.
Data Masking de-identifies and/or de-sensitizes private and confidential data by hiding it from those who are unauthorized to access it. Other terms for data masking include data obfuscation, sanitization, scrambling, de-identification, and anonymization.
To learn more, Download the Gartner Magic Quadrant Data Masking Report now. And visit the Informatica website for data masking product information.
About the Magic Quadrant
Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings. Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.
Which Method of Controls Should You Use to Protect Sensitive Data in Databases and Enterprise Applications? Part II
- Do you need to protect data at rest (in storage), during transmission, and/or when accessed?
- Do some privileged users still need the ability to view the original sensitive data or does sensitive data need to be obfuscated at all levels?
- What is the granularity of controls that you need?
- Datafile level
- Table level
- Row level
- Field / column level
- Cell level
- Do you need to be able to control viewing vs. modification of sensitive data?
- Do you need to maintain the original characteristics / format of the data (e.g. for testing, demo, development purposes)?
- Is response time latency / performance of high importance for the application? This can be the case for mission critical production applications that need to maintain response times in the order of seconds or sub-seconds.
In order to help you determine which method of control is appropriate for your requirements, the following table provides a comparison of the different methods and their characteristics.
A combination of protection method may be appropriate based on your requirements. For example, to protect data in non-production environments, you may want to use persistent data masking to ensure that no one has access to the original production data, since they don’t need to. This is especially true if your development and testing is outsourced to third parties. In addition, persistent data masking allows you to maintain the original characteristics of the data to ensure test data quality.
In production environments, you may want to use a combination of encryption and dynamic data masking. This is the case if you would like to ensure that all data at rest is protected against unauthorized users, yet you need to protect sensitive fields only for certain sets of authorized or privileged users, but the rest of your users should be able to view the data in the clear.
The best method or combination of methods will depend on each scenario and set of requirements for your environment and organization. As with any technology and solution, there is no one size fits all.
Which Method of Controls Should You Use to Protect Sensitive Data in Databases and Enterprise Applications? Part I
- Which types of data should be protected?
- Which data should be classified as “sensitive?”
- Where is this sensitive data located?
- Which groups of users should have access to this data?
Because these questions come up frequently, it seems ideal to share a few guidelines on this topic.
When protecting the confidentiality and integrity of data, the first level of defense is Authentication and access control. However, data with higher levels of sensitivity or confidentiality may require additional levels of protection, beyond regular authentication and authorization methods.
There are a number of control methods for securing sensitive data available in the market today, including:
- Persistent (Static) Data Masking
- Dynamic Data Masking
- Retention management and purging
Encryption is a cryptographic method of encoding data. There are generally, two methods of encryption: symmetric (using single secret key) and asymmetric (using public and private keys). Although there are methods of deciphering encrypted information without possessing the key, a good encryption algorithm makes it very difficult to decode the encrypted data without knowledge of the key. Key management is usually a key concern with this method of control. Encryption is ideal for mass protection of data (e.g. an entire data file, table, partition, etc.) against unauthorized users.
Persistent or static data masking obfuscates data at rest in storage. There is usually no way to retrieve the original data – the data is permanently masked. There are multiple techniques for masking data, including: shuffling, substitution, aging, encryption, domain-specific masking (e.g. email address, IP address, credit card, etc.), dictionary lookup, randomization, etc. Depending on the technique, there may be ways to perform reverse masking – this should be used sparingly. Persistent masking is ideal for cases where all users should not see the original sensitive data (e.g. for test / development environments) and field level data protection is required.
Dynamic data masking de-identifies data when it is accessed. The original data is still stored in the database. Dynamic data masking (DDM) acts as a proxy between the application and database and rewrites the user / application request against the database depending on whether the user has the privilege to view the data or not. If the requested data is not sensitive or the user is a privileged user who has the permission to access the sensitive data, then the DDM proxy passes the request to the database without modification, and the result set is returned to the user in the clear. If the data is sensitive and the user does not have the privilege to view the data, then the DDM proxy rewrites the request to include a masking function and passes the request to the database to execute. The result is returned to the user with the sensitive data masked. Dynamic data masking is ideal for protecting sensitive fields in production systems where application changes are difficult or disruptive to implement and performance / response time is of high importance.
Tokenization substitutes a sensitive data element with a non-sensitive data element or token. The first generation tokenization system requires a token server and a database to store the original sensitive data. The mapping from the clear text to the token makes it very difficult to reverse the token back to the original data without the token system. The existence of a token server and database storing the original sensitive data renders the token server and mapping database as a potential point of security vulnerability, bottleneck for scalability, and single point of failure. Next generation tokenization systems have addressed these weaknesses. However, tokenization does require changes to the application layer to tokenize and detokenize when the sensitive data is accessed. Tokenization can be used in production systems to protect sensitive data at rest in the database store, when changes to the application layer can be made relatively easily to perform the tokenization / detokenization operations.
Retention management and purging is more of a data management method to ensure that data is retained only as long as necessary. The best method of reducing data privacy risk is to eliminate the sensitive data. Therefore, appropriate retention, archiving, and purging policies should be applied to reduce the privacy and legal risks of holding on to sensitive data for too long. Retention management and purging is a data management best practices that should always be put to use.
CIOs and CFOs both dig data security
In my discussions with CIOs over the last couple of months, I asked them about the importance of a series of topics. All of them placed data security at the top of their IT priority list. Even their CFO counterparts, with whom they do not always see eye to eye, said they were very concerned about the business risk for corporate data. These CFOs said that they touch, as a part of owning business risk, security — especially from hacking. One CFO said that he worried, as well, about the impact of data security for compliance issues, including HIPAA and SOX. Another said this: “The security of data is becoming more and more important. The auditors are going after this. CFOs, for this reason, are really worried about getting hacked. This is a whole new direction, but some of the highly publicized recent hacks have scared a lot of folks and they combined represent to many of us a watershed event.”
According to David W. Owens the editor of CFO Magazine, even if you are using “secure” storage, such as internal drives and private clouds, the access to these areas can be anything but secure. Practically any employee can be carrying around sensitive financial and performance data in his or her pocket, at any time.” Obviously, new forms of data access have created new forms of data risk.
Are some retailers really leaving the keys in the ignition?
Given the like mind set from CIOs and CFOs, I was shocked to learn that some of the recently hacked retailers had been using outdated security software, which may have given hackers easier access company payment data systems. Most amazingly, some retailers had not even encrypted their customer payment data. Because of this, hackers were able to hide on the network for months and steal payment data, as customers continued to use their credit cards at the company’s point of sale locations.
Why weren’t these transactions encrypted or masked? In my 1998 financial information start-up, we encrypted our databases to protect against hacks of our customers’ personal financial data. One answer came from a discussion with a Fortune 100 Insurance CIO. This CIO said “CIO’s/CTO’s/CISO’s struggle with selling the value of these investment because the C Suite is only interested in hearing about investments with a direct impact on business outcomes and benefits”.
Enterprise security drives enterprise brand today
So how should leaders better argue the business case for security investments? I want to suggest that the value of IT is its “brand promise”. For retailers, in particular, if a past purchase decision creates a perceived personal data security risk, IT becomes a liability to their corporations brand equity and potentially creates a negative impact on future sales. Increasingly how these factors are managed either supports or not the value of a company’s brand.
My message is this: Spend whatever it takes to protect your brand equity; Otherwise a security issue will become a revenue issue.
In sum, this means organizations that want to differentiate themselves and avoid becoming a brand liability need to further invest in their data centric security strategy and of course, encryption. The game is no longer just about securing particular applications. IT organizations need to take a data centric approach to securing customer data and other types of enterprise data. Enterprise level data governance rules needs to be a requirement. A data centric approach can mitigate business risk by helping organizations to understand where sensitive data is and to protect it in motion and at rest.
Solutions: Enterprise Level Data Security
The State of Data Centric Security
How Is The CIO Role Starting To Change?
The CFO viewpoint on data
CFOs discuss their technology priorities
The information security industry is reporting that more than 1.5 billion (yes, that’s with a “B”) emails and passwords have been hacked. It’s hard to tell from the article, but this could be the big one. (And just when we thought that James Bond had taken care of the Russian mafia.) From both large and small companies, nobody is safe. According to the experts the sites ranged from small e-commerce sites to Fortune 500 companies. At this time the experts aren’t telling us who the big targets were. We could be very unpleasantly surprised.
Most security experts admit that the bulk of the post-breach activity will be email spamming. Insidious to be sure. But imagine if the hackers were to get a little more intelligent about what they have. How many individuals reuse passwords? Experts say over 90% of consumers reuse passwords between popular sites. And since email addresses are the most universally used “user name” on those sites, the chance of that 1.5 billion identities translating into millions of pirated activities is fairly high.
According to the recent published Ponemon study; 24% of respondents don’t know where their sensitive data is stored. That is a staggering amount. Further complicating the issue, the same study notes that 65% of the respondents have no comprehensive data forensics capability. That means that consumers are more than likely to never hear from their provider that their data had been breached. Until it is too late.
So now I guess we all get to go change our passwords again. And we don’t know why, we just have to. This is annoying. But it’s not a permanent fix to have consumers constantly looking over their virtual shoulders. Let’s talk about the enterprise sized firms first. Ponemon indicates that 57% of respondents would like more trained data security personnel to protect data. And the enterprise firm should have the resources to task IT personnel to protect data. They also have the ability to license best in class technology to protect data. There is no excuse not to implement an enterprise data masking technology. This should be used hand in hand with network intrusion defenses to protect from end to end.
Smaller enterprises have similar options. The same data masking technology can be leveraged on smaller scale by a smaller IT organization including the personnel to optimize the infrastructure. Additionally, most small enterprises leverage Cloud based systems that should have the same defenses in place. The small enterprise should bias their buying criteria in data systems for those that implement data masking technology.
Let me add a little fuel to the fire and talk about a different kind of cost. Insurers cover Cyber Risk either as part of a Commercial General Liability policy or as a separate policy. In 2013, insurers paid an average approaching $3.5M for each cyber breach claim. The average per record cost of claims was over $6,000. Now, imagine your enterprise’s slice of those 1.5 billion records. Obviously these are claims, not premiums. Premiums can range up to $40,000 per year for each $1M in coverage. Insurers will typically give discounts for those companies that have demonstrated security practices and infrastructure. I won’t belabor the point, it’s pure math at this point.
There is plenty of risk and cost to go around, to be sure. But there is a way to stay protected with Informatica. And now, let’s all take a few minutes to go change our passwords. I’ll wait right here. There, do you feel better?
For more information on Informatica’s data masking technology click here, where you can drill into dynamic and persistent data masking technology, leading in the industry. So you should still change your passwords…but check out the industry’s leading data security technology after you do.
I recently met with a longtime colleague from the Oracle E-Business Suite implementation eco-system, now VP of IT for a global technology provider. This individual has successfully implemented data archiving and data masking technologies to eliminate duplicate applications and control the costs of data growth – saving tens of millions of dollars. He has freed up resources that were re-deployed within new innovative projects such as Big Data – giving him the reputation as a thought leader. In addition, he has avoided exposing sensitive data in application development activities by securing it with data masking technology – thus securing his reputation.
When I asked him about those projects and the impact on his career, he responded, ‘Data archiving and data security are table stakes in the Oracle Applications IT game. However, if I want to be a part of anything important, it has to involve Cloud and Big Data.’ He further explained how the savings achieved from Informatica Data Archive enabled him to increase employee retention rates because he was able to fund an exciting Hadoop project that key resources wanted to work on. Not to mention, as he transitioned from physical infrastructure to a virtual server by retiring legacy applications – he had accomplished his first step on his ‘journey to the cloud’. This would not have been possible if his data required technology that was not supported in the cloud. If he hadn’t secured sensitive data and had experienced a breach, he would be looking for a new job in a new industry.
Not long after, I attended a CIO summit where the theme of the conference was ‘Breakthrough Innovation’. Of course, Cloud and Big Data were main stage topics – not just about the technology, but about how it was used to solve business challenges and provide services to the new generation of ‘entitled’ consumers. This is the description of those who expect to have everything at their fingertips. They want to be empowered to share or not share their information. They expect that if you are going to save their personal information, it will not be abused. Lastly, they may even expect to try a product or service for free before committing to buy.
In order to size up to these expectations, Application Owners, like my long-time colleague, need to incorporate Data Archive and Data Masking in their standard SDLC processes. Without Data Archive, IT budgets may be consumed by supporting old applications and mountains of data, thereby becoming inaccessible for new innovative projects. Without Data Masking, a public breach will drive many consumers elsewhere.
- The RSA conference took place in San Francisco from February 24-28, 2014
- The IAPP Global Privacy Summit took place Washington, DC from March 5-7, 2014
Data Privacy at the 2014 RSA Conference
The RSA conference was busy as expected, with over 30,000 attendees. Informatica co-sponsored an after-hours event with one of our partners, Imperva, at the Dark Circus. The event was standing room only and provided a great escape from the torrential rain. One highlight of RSA, for Informatica, is that we were honored with two of the 2014 Security Products Guide Awards:
- Informatica Dynamic Data Masking won the Gold Award for Database Security, Data Leakage Prevention/Extrusion Prevention
- Informatica Cloud Test Data Management and Security won the Bronze Award for New Products
Of particular interest to us was the growing recognition of data-centric security and privacy at RSA. I briefly met Bob Rudis, co-author of “Data Driven Security” which was featured at the onsite bookstore. In the book, Rudis has presented a great case for focusing on data as the center-point of security, through data analysis and visualization. From Informatica’s perspective, we also believe that a deep understanding of data and its relationships will escalate as a key driver of security policies and measures.
Data Privacy at the IAPP Global Privacy Summit
The IAPP Global Privacy Summit was an amazing event, small (2,500), but completely sold-out and overflowing its current venue. We exhibited and had the opportunity to meet CPOs, privacy, risk/compliance and security professionals from around the world, and had hundreds of conversations about the role of data discovery and masking for privacy. From the privacy perspective, it is all about finding, de-identification and protection of PII, PCI and PHI. These privacy professionals have extensive legal and/or data security backgrounds and understand the need to safeguard privacy by using data masking. Many notable themes were present at IAPP:
- De-identification is a key topic area
- Concerns about outsourcing and contractors in application development and testing have driven test data management adoption
- No national US privacy regulations expected in the short-term
- Europe has active but uneven privacy enforcement (France: “name and shame”, UK: heavy fines, Spain; most active)
If you want to learn more about data privacy and security, you will find no better place than Informatica World 2014. There, you’ll learn about the latest data security trends, see updates to Informatica’s data privacy and security offerings, and find out how Informatica protects sensitive information in real time without requiring costly, time-consuming changes to applications and databases. Register TODAY!