Tag Archives: Data Privacy
- A loss of customer trust
- Revenue shortfalls
- A plummeting stock price
- C-level executives losing their jobs
As a result, Data security and privacy has become a key topic of discussion, not just in IT meetings, but in the media and the boardroom.
Preventing access to sensitive data has become more complex than ever before. There are new potential entry points that IT never previously considered. These new options go beyond typical BYOD user devices like smartphones and tablets. Today’s entry points can be much smaller: Things like HVAC controllers, office polycoms and temperature control systems.
So what can organizations do to combat this increasing complexity? Traditional data security practices focus on securing both the perimeter and the endpoints. However, these practices are clearly no longer working and no longer manageable. Not only is the number and type of devices expanding, but the perimeter itself is no longer present. As companies increasingly outsource, off-shore and move operations to the cloud, it is no longer possible fence the perimeters and to keep intruders out. Because 3rd parties often require some form of access, even trusted user credentials may fall into the hands of malicious intruders.
Data security requires a new approach. It must use policies to follow the data and to protect it, regardless of where it is located and where it moves. Informatica is responding to this need. We are leveraging our market leadership and domain expertise in data management and security. We are defining a new data security offering and category. This week, we unveiled our entry into the Data Security market at our Informatica World conference. Our new security offering, Secure@Source™ will allow enterprises to discover, detect and protect sensitive data.
The first step towards protecting sensitive data is to locate and identify them. So Secure@Source™ first allows you discover where all the sensitive data are located in the enterprise and classify them. As part of the discovery, Secure@source also analyzes where sensitive data is being proliferated, who has access to the data, who are actually accessing them and whether the data is protected or unprotected when accessed. Secure@Source™ leverages Informatica’s PowerCenter repository and lineage technology to perform a first pass, quick discovery with a more in depth analysis and profiling over time. The solution allows you to determine the privacy risk index of your enterprise and slice and dice the analysis based on region, departments, organization hierarchy, as well as data classifications.
The longer term vision of Secure@Source™ will allow you to detect suspicious usage patterns and orchestrate the appropriate data protection method, such as: alerting, blocking, archiving and purging, dynamically masking, persistently masking, encrypting, and/or tokenizing the data. The data protection method will depend on whether the data store is a production or non-production system, and whether you would like to de-identify sensitive data across all users or only for some users. All can be deployed based on policies. Secure@Source™ is intended to be an open framework for aggregating data security analytics and will integrate with key partners to provide a comprehensive visibility and assessment of an enterprise data privacy risk.
Secure@Source™ is targeted for beta at the end of 2014 and general availability in early 2015. Informatica is recruiting a select group of charter customers to drive and provide feedback for the first release. Customers who are interested in being a charter customer should register and send email to SecureCustomers@informatica.com.
Data security breaches continue to escalate. Privacy legislation and enforcement is tightening and analysts have begun making dire predictions in regards to cyber security’s effectiveness. But there is more – Trusted insiders continue to be the major threat. In addition, most executives cannot identify the information they are trying to protect.
Data security is a senior management concern, not exclusive to IT. With this in mind, what is the next step CxOs must take to counter these breaches?
A new approach to Data Security
It is clear that a new approach is needed. This should focus on answering fundamental, but difficult and precise questions in regards to your data:
- What data should I be concerned about?
- Can I create re-usable rules for identifying and locating sensitive data in my organization?
- Can I do so both logically and physically?
- What is the source of the sensitive data and where is it consumed?
- What are the sensitive data relationships and proliferation?
- How is it protected? How should it be protected?
- How can I integrate data protection with my existing cyber security infrastructure?
The answers to these questions will help guide precise data security measures in order to protect the most valuable data. The answers need to be presented in an intuitive fashion, leveraging simple, yet revealing graphics and visualizations of your sensitive data risks and vulnerabilities.
At Informatica World 2014, Informatica will unveil its vision to help organizations address these concerns. This vision will assist in the development of precise security measures designed to counter the growing sophistication and frequency of cyber-attacks, and the ever present danger of rogue insiders.
Stay tuned, more to come from Informatica World 2014.
- The RSA conference took place in San Francisco from February 24-28, 2014
- The IAPP Global Privacy Summit took place Washington, DC from March 5-7, 2014
Data Privacy at the 2014 RSA Conference
The RSA conference was busy as expected, with over 30,000 attendees. Informatica co-sponsored an after-hours event with one of our partners, Imperva, at the Dark Circus. The event was standing room only and provided a great escape from the torrential rain. One highlight of RSA, for Informatica, is that we were honored with two of the 2014 Security Products Guide Awards:
- Informatica Dynamic Data Masking won the Gold Award for Database Security, Data Leakage Prevention/Extrusion Prevention
- Informatica Cloud Test Data Management and Security won the Bronze Award for New Products
Of particular interest to us was the growing recognition of data-centric security and privacy at RSA. I briefly met Bob Rudis, co-author of “Data Driven Security” which was featured at the onsite bookstore. In the book, Rudis has presented a great case for focusing on data as the center-point of security, through data analysis and visualization. From Informatica’s perspective, we also believe that a deep understanding of data and its relationships will escalate as a key driver of security policies and measures.
Data Privacy at the IAPP Global Privacy Summit
The IAPP Global Privacy Summit was an amazing event, small (2,500), but completely sold-out and overflowing its current venue. We exhibited and had the opportunity to meet CPOs, privacy, risk/compliance and security professionals from around the world, and had hundreds of conversations about the role of data discovery and masking for privacy. From the privacy perspective, it is all about finding, de-identification and protection of PII, PCI and PHI. These privacy professionals have extensive legal and/or data security backgrounds and understand the need to safeguard privacy by using data masking. Many notable themes were present at IAPP:
- De-identification is a key topic area
- Concerns about outsourcing and contractors in application development and testing have driven test data management adoption
- No national US privacy regulations expected in the short-term
- Europe has active but uneven privacy enforcement (France: “name and shame”, UK: heavy fines, Spain; most active)
If you want to learn more about data privacy and security, you will find no better place than Informatica World 2014. There, you’ll learn about the latest data security trends, see updates to Informatica’s data privacy and security offerings, and find out how Informatica protects sensitive information in real time without requiring costly, time-consuming changes to applications and databases. Register TODAY!
In the first two issues I spent time looking at the need for states to pay attention to the digital health and safety of their citizens, followed by the oft forgotten need to understand and protect the non-production data. This is data than has often proliferated and also ignored or forgotten about.
In many ways, non-production data is simpler to protect. Development and test systems can usually work effectively with realistic but not real PII data and realistic but not real volumes of data. On the other hand, production systems need the real production data complete with the wealth of information that enables individuals to be identified – and therefore presents a huge risk. If and when that data is compromised either deliberately or accidentally the consequences can be enormous; in the impact on the individual citizens and also the cost of remediation on the state. Many will remember the massive South Carolina data breach of late 2012 when over the course of 2 days a 74 GB database was downloaded and stolen, around 3.8 million payers and 1.9 million dependents had their social security information stolen and 3.3 million “lost” bank account details. The citizens’ pain didn’t end there, as the company South Carolina picked to help its citizens seems to have tried to exploit the situation.
The biggest problem with securing production data is that there are numerous legitimate users and uses of that data, and most often just a small number of potentially malicious or accidental attempts of inappropriate or dangerous access. So the question is… how does a state agency protect its citizens’ sensitive data while at the same time ensuring that legitimate uses and users continues – without performance impacts or any disruption of access? Obviously each state needs to make its own determination as to what approach works best for them.
This video does a good job at explaining the scope of the overall data privacy/security problems and also reviews a number of successful approaches to protecting sensitive data in both production and non-production environments. What you’ll find is that database encryption is just the start and is fine if the database is “stolen” (unless of course the key is stolen along with the data! Encryption locks the data away in the same way that a safe protects physical assets – but the same problem exists. If the key is stolen with the safe then all bets are off. Legitimate users are usually easily able deliberately breach and steal the sensitive contents, and it’s these latter occasions we need to understand and protect against. Given that the majority of data breaches are “inside jobs” we need to ensure that authorized users (end-users, DBAs, system administrators and so on) that have legitimate access only have access to the data they absolutely need, no more and no less.
So we have reached the end of the first series. In the first blog we looked at the need for states to place the same emphasis on the digital health and welfare of their citizens as they do on their physical and mental health. In the second we looked at the oft-forgotten area of non-production (development, testing, QA etc.) data. In this third and final piece we looked at the need to and some options for providing the complete protection of non-production data.
In my first article on the topic of citizens’ digital health and safety we looked at the states’ desire to keep their citizens healthy and safe and also at the various laws and regulations they have in place around data breaches and losses. The size and scale of the problem together with some ideas for effective risk mitigation are in this whitepaper.
Let’s now start delving a little deeper into the situation states are faced with. It’s pretty obvious that citizen data that enables an individual to be identified (PII) needs to be protected. We immediately think of the production data: data that is used in integrated eligibility systems; in health insurance exchanges; in data warehouses and so on. In some ways the production data is the least of our problems; our research shows that the average state has around 10 to 12 full copies of data for non-production (development, test, user acceptance and so on) purposes. This data tends to be much more vulnerable because it is widespread and used by a wide variety of people – often subcontractors or outsourcers, and often the content of the data is not well understood.
Obviously production systems need access to real production data (I’ll cover how best to protect that in the next issue), on the other hand non-production systems of every sort do not. Non-production systems most often need realistic, but not real data and realistic, but not real data volumes (except maybe for the performance/stress/throughput testing system). What need to be done? Well to start with, a three point risk remediation plan would be a good place to start.
1. Understand the non-production data using sophisticated data and schema profiling combined with NLP (Natural Language Processing) techniques help to identify previously unrealized PII that needs protecting.
2. Permanently mask the PII so that it is no longer the real data but is realistic enough for non-production uses and make sure that the same masking is applied to the attribute values wherever they appear in multiple tables/files.
3. Subset the data to reduce data volumes, this limits the size of the risk and also has positive effects on performance, run-times, backups etc.
Gartner has just published their 2013 magic quadrant for data masking this covers both what they call static (i.e. permanent or persistent masking) and dynamic (more on this in the next issue) masking. As usual the MQ gives a good overview of the issues behind the technology as well as a review of the position, strengths and weaknesses of the leading vendors.
It is (or at least should be) an imperative that from the top down state governments realize the importance and vulnerability of their citizens data and put in place a non-partisan plan to prevent any future breaches. As the reader might imagine, for any such plan to success needs a combination of cultural and organizational change (getting people to care) and putting the right technology – together these will greatly reduce the risk. In the next and final issue on this topic we will look at the vulnerabilities of production data, and what can be done to dramatically increase its privacy and security.
Informatica announced, once again, that it is listed as a leader in the industry’s second Gartner Magic Quadrant for Data Masking Technology. With data security continuing to grow as one of the fastest segments in the enterprise software market, technologies such as data masking are becoming the solution of choice for data-centric security.
Increased fear of cyber-attacks and internal data breaches has made predictions that 2014 is the year of preventative and tactical measures to ensure corporate data assets are safe. Data masking should be included in those measures. According to Gartner,
“Security program managers need to take a strategic approach with tactical best-practice technology configurations in order to properly address the most common advanced targeted attack scenarios to increase both detection and prevention capabilities.”
Without these measures, the cost of an attack or breach is growing every year. The Ponemon Institute posted in a recent study:
“The 2013 Cost of Cyber Crime Study states that the average annualized cost of cybercrime incurred by a benchmark sample of US organizations was $11.56 million, nearly 78% more than the cost estimated in the first analysis conducted 4 years ago.”
Informatica believes that the best preventative measures include a layered approach for data security but without sacrificing agility or adding unnecessary costs. Data Masking delivers data-centric security with improved productivity and reduced overall costs.
Data Masking prevents internal data theft and abuse of sensitive data by hiding it from users. Data masking techniques include replacing some fields with similar-looking characters, masking characters (for example, “x”), substituting real last names with fictional last names and shuffling data within columns – to name a few. Other terms for data masking include data obfuscation, sanitization, scrambling, de-identification, and anonymization . Call it what you like, but without it – organizations may continue to expose sensitive data to those with mal intentions.
To learn more, Download the Gartner Magic Quadrant Data Masking Report now. And visit the Informatica website for data masking product information.
About the Magic Quadrant
Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings. Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose
A data integration hub is a proven vehicle to provide a self service model for publishing and subscribing data to be made available to a variety of users. For those who deploy these environments for regulated and sensitive data need to think of data privacy and data governance during the design phase of the project.
In the data integration hub architecture, think about how sensitive data will be coming from different locations, from a variety of technology platforms, and certainly from systems being managed by teams with a wide range of data security skills. How can you ensure data will be protected across such a heterogeneous environment? Not to mention if data traverses across national boundaries.
Then think about testing connectivity. If data needs to be validated in a data quality rules engine, in order to truly test this connectivity, there needs to be a capability to test using valid data. However testers should not have access or visibility into the actual data itself if it is classified as sensitive or confidential.
With a hub and spoke model, the rules are difficult to enforce if data is being requested from one country and received in another. The opportunity for exposing human error and potential data leakage increases exponentially. Rather than reading about a breach in the headlines, it may make sense to look at building preventative measures or spending the time and money to do the right thing from the onset of the project.
There are technologies that exist in the market that are easy to implement that are designed to prevent this very type of exposure. This technology is called data masking which includes data obfuscation, encryption and tokenization. Informatica’s Data Privacy solution based on persistent and dynamic data masking options can be easily and quickly deployed without the need to develop code or modify the source or target application.
When developing your reference architecture for a data integration hub, incorporate sound data governance policies and build data privacy into the application upfront. Don’t wait for the headlines to include your company and someone’s personal data.
Informatica recently hosted a webinar with Cognizant who shared how they streamline test data management processes internally with Informatica Test Data Management and pass on the benefits to their customers. Proclaimed as the world’s largest Quality Engineering and Assurance (QE&A) service provider, they have over 400 customers and thousands of testers and are considered a thought leader in the testing practice.
We polled over 100 attendees on what their top challenges were with test data management considering the data and system complexities and the need to protect their client’s sensitive data. Here are the results from that poll:
It was not surprising to see that generating test data sets and securing sensitive data in non-production environments were tied as the top two biggest challenges. Data integrity/synchronization was a very close 3rd .
Cognizant with Informatica has been evolving its test data management offering to truly focus on not only securing sensitive data – but also improving testing efficiencies with identifying, provisioning and resetting test data – tasks that consume as much as 40% of testing cycle times. As part of the next generation test data management platform, key components of that solution include:
Sensitive Data Discovery – an integrated and automated process that searches data sets looking for exposed sensitive data. Many times, sensitive data resides in test copies unbeknownst to auditors. Once data has been located, data can be masked in non-production copies.
Persistent Data Masking – masks sensitive data in-flight while cloning data from production or in-place on a gold copy. Data formats are preserved while original values are completely protected.
Data Privacy Compliance Validation – auditors want to know that data has in fact been protected, the ability to validate and report on data privacy compliance becomes critical.
Test Data Management – in addition to creating test data subsets, clients require the ability to synthetically generate test data sets to eliminate defects by having data sets aligned to optimize each test case. Also, in many cases, multiple testers work on the same environment and may clobber each other’s test data sets. Having the ability to reset test data becomes a key requirement to improve efficiencies.
Figure 2 Next Generation Test Data Management
When asked what tools or services that have been deployed, 78% said in-house developed scripts/utilities. This is an incredibly time-consuming approach and one that has limited repeatability. Data masking was deployed in almost half of the respondents.
Informatica with Cognizant are leading the way to establishing a new standard for Test Data Management by incorporating both test data generation, data masking, and the ability to refresh or reset test data sets. For more information, check out Cognizant’s offering based on Informatica: TDMaxim and White Paper: Transforming Test Data Management for Increased Business Value.
In recent conversations regarding solutions to implement for data privacy, our Dynamic Data Masking team put together the following table to highlight the differences between encryption / tokenization and Dynamic Data Masking (DDM). Best practices dictate that both should be implemented in an enterprise for the most comprehensive and complete data security strategy. For the purpose of this blog, here are a few definitions:
Dynamic Data Masking (DDM) protects sensitive data when it is retrieved based on policy without requiring the data to be altered when it is stored persistently. Authorized users will see true data, unauthorized users will see masked values in the application. No coding is required in the source application.
Encryption / tokenization protects sensitive data by altering its values when stored persistently while being able to decrypt and present the original values when requested by authorized users. The user is validated by a separate service which then provides a decryption key. Unauthorized users will only see the encrypted values. In many cases, applications need to be altered requiring development work.
|Business users access PII||Business users work with actual SSN and personal values in the clear (not with tokenized values). As the data is tokenized in the database, it needs to be de-tokenized every time it is accessed by users – which is done be changing the application source-code (imposing costs and risks), and causing performance penalty.For example, if a user needs to retrieve information on a client with SSN = ‘987-65-4329’, the application needs to de-tokenize the entire tokenized SSN column to identify the correct client info – a costly operation. This is why implementation scope is limited.||As DDM does not change the data in the database, but only masks it when accessed by unauthorized users, authorized users do not experience any performance hit nor require application source-code changes.For example, if an authorized user needs to retrieve information on a client with SSN = ‘987-65-4329’, his request is untouched by DDM. As the SSN stored in the database is not changed, there is no performance penalty involved.In case an unauthorized user retrieves the same SSN, DDM masks the SQL request, causing the sensitive data result (e.g., name, address, CC and age) to be masked, hidden or completely blocked.|
|Privileged Infrastructure DBA have access to the database server files||Personal Identifiable Information (PII) stored in the database files is tokenized, ensuring that the few administrators that have uncontrolled access to the database servers cannot see it||PII stored in the database files remains in the clear. The few administrators that have uncontrolled access to the database servers can potentially access it.|
|Production support, application developers, DBAs, consultants, outsource and offshore teams||These groups of users have application super-user privileges, seen by the tokenization solution as authorized, and as such access PII in the clear!!!||These users are identified by DDM as unauthorized, and as such are masked, hidden or blocked, protecting the PII.|
|Data warehouse protection||Implementing tokenization on Data warehouses requires tedious database changes and causes performance penalty:1.Loading or reporting upon millions of PII records requires to tokenize/de-tokenize each record.2.Running a report with a condition on a tokenized value (e.g., when having a condition: SSN like (‘%333’) causes the de-tokenization of the entire column).
Massive database configuration changes are required to use the tokenization API, creating and maintaining hundreds of views.
|No performance penalty.No need to change reports, databases or to create views.|
Combining both DDM and encryption/tokenization presents an opportunity to deliver complete data privacy without the need to alter the application or write any code.
Informatica works with its encryption and tokenization partners to deliver comprehensive data privacy protection in packaged applications, data warehouses and Big Data platforms such as Hadoop.
Last night Informatica was given the Silver award for Best Security Software by Info Security. The Best Security Software was one of the most competitive categories—with 8 finalists offering technologies ranging from mobile to cloud security.
Informatica won the award for its new Cloud Data Masking solution. Starting in June of last year, Informatica has steadily released a series of new Cloud solutions for data security. Informatica is the first to offer a comprehensive, data governance based solution for cloud data privacy. This solution addresses the full lifecycle of data privacy, including:
- Defining and classifying sensitive data
- Discovering where sensitive data lives
- Applying consistent data masking rules
- Measuring and monitoring to prove compliance
The Cloud Data Masking adds to Informatica’s leading cloud integration solution for salesforce.com includes data synchronization, data replication, data quality, and master data management.
Why is Cloud Data Masking important?
Sensitive data is at risk of being exposed during application development and testing, where it is important to use real production data to rigorously test applications. As reported by the Ponemon Institute, a data breach costs organizations on average $5.5 million dollars.
What does Cloud Data Masking do?
Based on Informatica’s market leading Data Masking technology, Informatica’s new Cloud Data Masking enables cloud customers to secure sensitive information during the testing phase by directly masking production data used within cloud sandboxes, creating realistic-looking, but de-identified data. Customers are therefore able to protect sensitive information from unintended exposure during development, test and training activities; streamline cloud projects by reducing the time it takes to mask test/training/development environments; and ensure compliance with mounting privacy regulations.
What do people do today?
Many organizations today will hand the masking efforts over to IT. This inevitably lengthens development cycles and delays releases. One of Informatica’s longtime customers and current partners, David Cheung of Cloud Sherpas, stated “Many customers wait days for IT to change the sensitive or confidential data, delaying releases. For example, I was at customer last week where the customer was waiting 5 days for IT to mask the sensitive data.”
Others use scripting or manual methods to mask the data. One prospect I spoke to recently said he manually altered the data but missed a few email addresses. So during a test run, the company accidentally sent emails to customers. These customers called back to demand what was going on. Do you want that to happen to you?
Visit Informatica Cloud Data Masking for more information.