Category Archives: Data masking
This is the first in a series of articles where I will take an in-depth look at how state and local governments are affected by data breaches and what they should be considering as part of their compliance, risk-avoidance and remediation plans.
Each state has one or more agencies that are focused on the lives, physical and mental health and overall welfare of their citizens. The mission statement of the Department of Public Welfare of Pennsylvania, my home state is typical, it reads “Our vision is to see Pennsylvanians living safe, healthy and independent lives. Our mission is to improve the quality of life for Pennsylvania’s individuals and families. We promote opportunities for independence through services and supports while demonstrating accountability for taxpayer resources.”
Just as in the enterprise, over the last couple of decades the way an agency deals with citizens has changed dramatically. No longer is everything paper-based and manually intensive – each state has made enormous efforts not just to automate more and more of their processes but more lately to put everything online. The combination of these two factors has led to the situation where just about everything a state knows about each citizen is stored in numerous databases, data warehouses and of course accessed through the Web.
It’s interesting that in the PA mission statement two of the three focus areas are safety and health– I am sure when written these were meant in the physical sense. We now have to consider what each state is doing to safeguard and promote the digital safety and health of its citizens. You might ask what digital safety and health means – at the highest level this is quite straightforward – it means that each state must ensure the data it holds about its’ citizens is safe from inadvertent or deliberate exposure or disclosure. It seems that each week we read about another data breach – high profile data breach infographic - either accidental (a stolen laptop for instance) or deliberate (hacking as an example) losses of data about people – the citizens. Often that includes data contents that can be used to identify the individuals, and once an individual citizen is identified they are at risk of identity theft, credit card fraud or worse.
Of the 50 states, 46 now have a series of laws and regulations in place about when and how they need to report on data breaches or losses – this is all well and good, but is a bit like shutting the stable door after the horse has bolted – but with higher stakes as there are potentially dire consequences to the digital safety and health of their citizens.
In the next article I will look at the numerous areas that are often overlooked when states establish and execute their data protection and data privacy plans.
A data integration hub is a proven vehicle to provide a self service model for publishing and subscribing data to be made available to a variety of users. For those who deploy these environments for regulated and sensitive data need to think of data privacy and data governance during the design phase of the project.
In the data integration hub architecture, think about how sensitive data will be coming from different locations, from a variety of technology platforms, and certainly from systems being managed by teams with a wide range of data security skills. How can you ensure data will be protected across such a heterogeneous environment? Not to mention if data traverses across national boundaries.
Then think about testing connectivity. If data needs to be validated in a data quality rules engine, in order to truly test this connectivity, there needs to be a capability to test using valid data. However testers should not have access or visibility into the actual data itself if it is classified as sensitive or confidential.
With a hub and spoke model, the rules are difficult to enforce if data is being requested from one country and received in another. The opportunity for exposing human error and potential data leakage increases exponentially. Rather than reading about a breach in the headlines, it may make sense to look at building preventative measures or spending the time and money to do the right thing from the onset of the project.
There are technologies that exist in the market that are easy to implement that are designed to prevent this very type of exposure. This technology is called data masking which includes data obfuscation, encryption and tokenization. Informatica’s Data Privacy solution based on persistent and dynamic data masking options can be easily and quickly deployed without the need to develop code or modify the source or target application.
When developing your reference architecture for a data integration hub, incorporate sound data governance policies and build data privacy into the application upfront. Don’t wait for the headlines to include your company and someone’s personal data.
Many Salesforce developers that use sandbox environments for test and development suffer from the following challenges:
- Lack of relevant data for proper testing and development (empty sandboxes)
- To fix that problem, they manually copy data from production
- Which results in exposing sensitive data to unauthorized users
- And potentially consuming more storage than allocated for sandbox environments (resulting in unexpected costs)
To address these challenges, Informatica just released Cloud Test Data Management for Salesforce. This solution is designed to give Salesforce admins and developers the ability to provision secure test data subsets to developers through an easy to use, wizard driven approach. The application is delivered as a service through a subscription-based pricing model.
The Informatica IT team uses Salesforce internally and validated an ROI based on reducing the amount of developer time used to manually script copying data from production to a sandbox, reducing the amount of time fixing defects due to not having the right test data, and eliminating the risk of a data breach by masking sensitive data.
To learn more about this new offering, watch a demonstration that shows how to create secure test data subsets for Salesforce. Also, available now, try the free Cloud Data Masking app or take a 30-day Cloud Test Data Management trial.
Informatica recently hosted a webinar with Cognizant who shared how they streamline test data management processes internally with Informatica Test Data Management and pass on the benefits to their customers. Proclaimed as the world’s largest Quality Engineering and Assurance (QE&A) service provider, they have over 400 customers and thousands of testers and are considered a thought leader in the testing practice.
We polled over 100 attendees on what their top challenges were with test data management considering the data and system complexities and the need to protect their client’s sensitive data. Here are the results from that poll:
It was not surprising to see that generating test data sets and securing sensitive data in non-production environments were tied as the top two biggest challenges. Data integrity/synchronization was a very close 3rd .
Cognizant with Informatica has been evolving its test data management offering to truly focus on not only securing sensitive data – but also improving testing efficiencies with identifying, provisioning and resetting test data – tasks that consume as much as 40% of testing cycle times. As part of the next generation test data management platform, key components of that solution include:
Sensitive Data Discovery – an integrated and automated process that searches data sets looking for exposed sensitive data. Many times, sensitive data resides in test copies unbeknownst to auditors. Once data has been located, data can be masked in non-production copies.
Persistent Data Masking – masks sensitive data in-flight while cloning data from production or in-place on a gold copy. Data formats are preserved while original values are completely protected.
Data Privacy Compliance Validation – auditors want to know that data has in fact been protected, the ability to validate and report on data privacy compliance becomes critical.
Test Data Management – in addition to creating test data subsets, clients require the ability to synthetically generate test data sets to eliminate defects by having data sets aligned to optimize each test case. Also, in many cases, multiple testers work on the same environment and may clobber each other’s test data sets. Having the ability to reset test data becomes a key requirement to improve efficiencies.
Figure 2 Next Generation Test Data Management
When asked what tools or services that have been deployed, 78% said in-house developed scripts/utilities. This is an incredibly time-consuming approach and one that has limited repeatability. Data masking was deployed in almost half of the respondents.
Informatica with Cognizant are leading the way to establishing a new standard for Test Data Management by incorporating both test data generation, data masking, and the ability to refresh or reset test data sets. For more information, check out Cognizant’s offering based on Informatica: TDMaxim and White Paper: Transforming Test Data Management for Increased Business Value.
In recent conversations regarding solutions to implement for data privacy, our Dynamic Data Masking team put together the following table to highlight the differences between encryption / tokenization and Dynamic Data Masking (DDM). Best practices dictate that both should be implemented in an enterprise for the most comprehensive and complete data security strategy. For the purpose of this blog, here are a few definitions:
Dynamic Data Masking (DDM) protects sensitive data when it is retrieved based on policy without requiring the data to be altered when it is stored persistently. Authorized users will see true data, unauthorized users will see masked values in the application. No coding is required in the source application.
Encryption / tokenization protects sensitive data by altering its values when stored persistently while being able to decrypt and present the original values when requested by authorized users. The user is validated by a separate service which then provides a decryption key. Unauthorized users will only see the encrypted values. In many cases, applications need to be altered requiring development work.
|Business users access PII||Business users work with actual SSN and personal values in the clear (not with tokenized values). As the data is tokenized in the database, it needs to be de-tokenized every time it is accessed by users – which is done be changing the application source-code (imposing costs and risks), and causing performance penalty.For example, if a user needs to retrieve information on a client with SSN = ‘987-65-4329’, the application needs to de-tokenize the entire tokenized SSN column to identify the correct client info – a costly operation. This is why implementation scope is limited.||As DDM does not change the data in the database, but only masks it when accessed by unauthorized users, authorized users do not experience any performance hit nor require application source-code changes.For example, if an authorized user needs to retrieve information on a client with SSN = ‘987-65-4329’, his request is untouched by DDM. As the SSN stored in the database is not changed, there is no performance penalty involved.In case an unauthorized user retrieves the same SSN, DDM masks the SQL request, causing the sensitive data result (e.g., name, address, CC and age) to be masked, hidden or completely blocked.|
|Privileged Infrastructure DBA have access to the database server files||Personal Identifiable Information (PII) stored in the database files is tokenized, ensuring that the few administrators that have uncontrolled access to the database servers cannot see it||PII stored in the database files remains in the clear. The few administrators that have uncontrolled access to the database servers can potentially access it.|
|Production support, application developers, DBAs, consultants, outsource and offshore teams||These groups of users have application super-user privileges, seen by the tokenization solution as authorized, and as such access PII in the clear!!!||These users are identified by DDM as unauthorized, and as such are masked, hidden or blocked, protecting the PII.|
|Data warehouse protection||Implementing tokenization on Data warehouses requires tedious database changes and causes performance penalty:1.Loading or reporting upon millions of PII records requires to tokenize/de-tokenize each record.2.Running a report with a condition on a tokenized value (e.g., when having a condition: SSN like (‘%333’) causes the de-tokenization of the entire column).
Massive database configuration changes are required to use the tokenization API, creating and maintaining hundreds of views.
|No performance penalty.No need to change reports, databases or to create views.|
Combining both DDM and encryption/tokenization presents an opportunity to deliver complete data privacy without the need to alter the application or write any code.
Informatica works with its encryption and tokenization partners to deliver comprehensive data privacy protection in packaged applications, data warehouses and Big Data platforms such as Hadoop.
Informatica’s Vibe virtual data machine can streamline big data work and allow data scientists to be more efficient
Informatica introduced an embeddable Vibe engine for not only transformation, but also for data quality, data profiling, data masking and a host of other data integration tasks. It will have a meaningful impact on the data scientist shortage.
Some clear economic facts are already apparent in the current world of data. Hadoop provides a significantly less expensive platform for gathering and analyzing data; cloud computing (potentially) is a more economical computing location than on-premises, if managed well. These are clearly positive developments. On the other hand, the human resources required to exploit these new opportunities are actually quite expensive. When there is greater demand than can be met in the short term for a hot product, suppliers put customers “on allocation” to manage the distribution to the most strategic customers.
This is the situation with “data scientists,” this new breed of experts with quantitative skills, data management skills, presentation skills and deep domain expertise. Current estimates are that there are 60,000 – 120,000 unfilled positions in the US alone. Naturally, data scientists are “allocated” to the most critical (economically lucrative) efforts, and their time is limited to those tasks that most completely leverage their unique skills.
To address this shortage, industry turns to universities to develop curricula to manufacture data scientists, but this will take time. In the meantime, salaries for data scientists are very high. Unfortunately, most data science work involves a great deal of effort that does not require data science skills, especially in the areas of managing the data prior to the insightful analytics. Some estimates are that data scientists spend 50-80% of their time finding and cleaning data, managing their computing platforms and writing programs. Reducing this effort with better tools can not only make data scientists more effective, it have an impact on the most expensive component of big data – human resources.
Informatica today introduced Vibe, its embeddable virtual data machine to do exactly that. Informatica has, for over 20 years, provided tools that allow developers to design and execute transformation of data without the need for writing or maintaining code. With Vibe, this capability is extended to include data quality, masking and profiling and the engine itself can be embedded in the platforms where the work is performed. In addition, the engine can generate separate code from a single data management design.
In the case of Hadoop, Informatica designers can continue to operate in the familiar design studio, and have Vibe generate the code for whatever platform is needed.In this way, it is possible for an Informatica developer to develop these data management routines for Hadoop, without learning Hadoop or writing code in Java. And the real advantage is that the data scientist is freed from work that can be performed by those in lower pay grades and can parallelize that work too – multiple programmers and integration developers to one data scientist.
Vibe is a major innovation for Informatica that provides many interesting opportunities for it’s customers. Easing the data scientist problem is only one.
This is a guest blog penned by Neil Raden, a well-known industry figure as an author, lecturer and practitioner. He has in-depth experience as a developer, consultant and analyst in all areas of Analytics and Decision Services including Big Data strategy and implementation, Business Intelligence, Data Warehousing, Statistical/Predictive Modeling, Decision Management, and IT systems integration including assessment, architecture, planning, project management and execution. Neil has authored dozens of sponsored white papers and articles, blogger and co-author of “Smart Enough) Systems” (Prentice Hall, 2007). He has 25 years as an actuary, software engineer and systems integrator.
Join us this year at Informatica World!
We have a great line up of speakers and events to help you become a data driven healthcare organization… I’ve provided a few highlights below:
Participate in the Informatica World Keynote sessions with Sohaib Abbasi and Rick Smolan who wrote “The Human Face of Big Data” — learn more via this quick YouTube video: http://www.youtube.com/watch?v=7K5d9ArRLJE&feature=player_embedded
With more than 100 interactive and in-depth breakout sessions, spanning 6 different tracks, (Platform & Products, Architecture, Best Practices, Big Data, Hybrid IT and Tech Talk), Informatica World is an excellent way to ensure you are getting the most from your Informatica investment. Learn best practices from organizations who are realizing the potential of their data like: Ochsner Health, Sutter Health, UMass Memorial, Qualcomm and Paypal.
Finally, we want you to balance work with a little play… we invite you to network with industry peers at our Healthcare Cocktail Reception on the evening of Wednesday, June 5th and again during our Data Driven Healthcare Breakfast Roundtable on Thursday, June 6th.
See you there!
According to analysts, users spend the majority of the application development lifecycle in development and testing and the least amount of time in quality management and documentation. This is probably not very shocking to anyone in QA or on a testing team. But how much time is actually spent on test data management? In a recent webinar, more than half of the listeners polled say they spend between 30-40% of their effort on ‘data related tasks.’ (more…)
Last night Informatica was given the Silver award for Best Security Software by Info Security. The Best Security Software was one of the most competitive categories—with 8 finalists offering technologies ranging from mobile to cloud security.
Informatica won the award for its new Cloud Data Masking solution. Starting in June of last year, Informatica has steadily released a series of new Cloud solutions for data security. Informatica is the first to offer a comprehensive, data governance based solution for cloud data privacy. This solution addresses the full lifecycle of data privacy, including:
- Defining and classifying sensitive data
- Discovering where sensitive data lives
- Applying consistent data masking rules
- Measuring and monitoring to prove compliance
The Cloud Data Masking adds to Informatica’s leading cloud integration solution for salesforce.com includes data synchronization, data replication, data quality, and master data management.
Why is Cloud Data Masking important?
Sensitive data is at risk of being exposed during application development and testing, where it is important to use real production data to rigorously test applications. As reported by the Ponemon Institute, a data breach costs organizations on average $5.5 million dollars.
What does Cloud Data Masking do?
Based on Informatica’s market leading Data Masking technology, Informatica’s new Cloud Data Masking enables cloud customers to secure sensitive information during the testing phase by directly masking production data used within cloud sandboxes, creating realistic-looking, but de-identified data. Customers are therefore able to protect sensitive information from unintended exposure during development, test and training activities; streamline cloud projects by reducing the time it takes to mask test/training/development environments; and ensure compliance with mounting privacy regulations.
What do people do today?
Many organizations today will hand the masking efforts over to IT. This inevitably lengthens development cycles and delays releases. One of Informatica’s longtime customers and current partners, David Cheung of Cloud Sherpas, stated “Many customers wait days for IT to change the sensitive or confidential data, delaying releases. For example, I was at customer last week where the customer was waiting 5 days for IT to mask the sensitive data.”
Others use scripting or manual methods to mask the data. One prospect I spoke to recently said he manually altered the data but missed a few email addresses. So during a test run, the company accidentally sent emails to customers. These customers called back to demand what was going on. Do you want that to happen to you?
Visit Informatica Cloud Data Masking for more information.