Category Archives: Application ILM
- PII – Personally Identifiable Information – any data that could potentially identify a specific individual. Any information that can be used to distinguish one person from another and can be used for de-anonymizing anonymous data can be considered PII
- GSA’s Rules of Behavior for Handling Personally Identifiable Information – This directive provides GSA’s policy on how to properly handle PII and the consequences and corrective actions that will be taken if a breach occurs
- PHI – Protected Health Information – any information about health status, provision of health care, or payment for health care that can be lined to a specific individual
- HIPAA Privacy Rule – The HIPAA Privacy Rule establishes national standards to protect individuals’ medical records and other personal health information and applies to health plans, health care clearinghouses, and those health care providers that conduct certain health care transactions electronically. The Rule requires appropriate safeguards to protect the privacy of personal health information, and sets limits and conditions on the uses and disclosures that may be made of such information without patient authorization. The Rule also gives patients rights over their health information, including rights to examine and obtain a copy of their health records, and to request corrections.
- Encryption – a method of protecting data by scrambling it into an unreadable form. It is a systematic encoding process which is only reversible with the right key.
- Tokenization – a method of replacing sensitive data with non-sensitive placeholder tokens. These tokens are swapped with data stored in relational databases and files.
- Data masking – a process that scrambles data, either an entire database or a subset. Unlike encryption, masking is not reversible; unlike tokenization, masked data is useful for limited purposes. There are several types of data masking:
- Static data masking (SDM) masks data in advance of using it. Non production databases masked NOT in real-time.
- Dynamic data masking (DDM) masks production data in real time
- Data Redaction – masks unstructured content (PDF, Word, Excel)
Each of the three methods for protecting data (encryption, tokenization and data masking) have different benefits and work to solve different security issues . We’ll address them in a bit. For a visual representation of the three methods – please see the table below:
For protecting PHI data – encryption is superior to tokenization. You encrypt different portions of personal healthcare data under different encryption keys. Only those with the requisite keys can see the data. This form of encryption requires advanced application support to manage the different data sets to be viewed or updated by different audiences. The key management service must be very scalable to handle even a modest community of users. Record management is particularly complicated. Encryption works better than tokenization for PHI – but it does not scale well.
Properly deployed, encryption is a perfectly suitable tool for protecting PII. It can be set up to protect archived data or data residing on file systems without modification to business processes.
- To protect the data, you must install encryption and key management services to protect the data – this only protects the data from access that circumvents applications
- You can add application layer encryption to protect data in use
- This requires changing applications and databases to support the additional protection
- You will pay the cost of modification and the performance of the application will be impacted
For tokenization of PHI – there are many pieces of data which must be bundled up in different ways for many different audiences. Using the tokenized data requires it to be de-tokenized (which usually includes a decryption process). This introduces an overhead to the process. A person’s medical history is a combination of medical attributes, doctor visits, outsourced visits. It is an entangled set of personal, financial, and medical data. Different groups need access to different subsets. Each audience needs a different slice of the data – but must not see the rest of it. You need to issue a different token for each and every audience. You will need a very sophisticated token management and tracking system to divide up the data, issuing and tracking different tokens for each audience.
Masking can scramble individual data columns in different ways so that the masked data looks like the original (retaining its format and data type) but it is no longer sensitive data. Masking is effective for maintaining aggregate values across an entire database, enabling preservation of sum and average values within a data set, while changing all the individual data elements. Masking plus encryption provide a powerful combination for distribution and sharing of medical information
Traditionally, data masking has been viewed as a technique for solving a test data problem. The December 2014 Gartner Magic Quadrant Report on Data Masking Technology extends the scope of data masking to more broadly include data de-identification in production, non-production, and analytic use cases. The challenge is to do this while retaining business value in the information for consumption and use.
Masked data should be realistic and quasi-real. It should satisfy the same business rules as real data. It is very common to use masked data in test and development environments as the data looks like “real” data, but doesn’t contain any sensitive information.
I was recently talking to an executive responsible for IT infrastructure of a mid-sized bank and he mentioned that he has 17 core applications critical for business, and, over 200 ancillary applications. Each core application historically has significant changes, once a quarter and 2 small changes every week. That is about 70 big changes and 1700 small changes every year!!! Just for the core applications that are critical to business.
How does the business ensure that these perturbations to the existing systems do not break existing capabilities within the system? Worse, how does a business ensure that changes to these systems do not break other downstream applications that rely on data that is being captured in these systems? How do testing teams pick up the right test data to test all possible permutations and combinations that that these changes introduce?
Test data management solutions solve these problems with by subsetting a small set of product data to be used for testing and masking that data so that sensitive data is not exposed to a large set of users. Test data generation allows for data to be augmented for features that are currently not in production as well as introducing bad data in the environment.
As the number of applications increase many fold and the development strategies move from waterfall model to agile and continuous integration, there is a need to not only to provision test data, but provision it in such a way that it can be automated and used repeatedly. That requires a warehouse of test data that is categorized and tagged by test cases, test areas. This test data warehouse should:
- Allow testers to review test data in the warehouse, tag interesting data and data sets and be able to search on them
- In case test data is missing, augment test data
- Visualize the test data, identifying white spaces in test data and generate test data to fill the white space.
Once this warehouse of test datasets are available, testers should be able to reserve test records that are being used by them for their testing. These records can be used to restore the test environments from the test data warehouse. Testers can then run their test cases without impacting other testers who are using the same test environment.
This still leaves the question open on how do organizations test a business process that spans multiple systems. How can organizations create test data in these systems that allow a business process to be executed from one end to another? Once organizations can master this capability, they will reach their potential in automating their test processes, and reduce risk that each perturbation in the environment inevitably bring.
Every year, I get a replacement desk calendar to help keep all of our activities straight – and for a family of four, that is no easy task. I start with taking all of the little appointment cards the dentist, orthodontist, pediatrician and GP give to us for appointments that occur beyond the current calendar dates. I transcribe them all. Then I go through last year’s calendar to transfer any information that is relevant to this year’s calendar. And finally, I put the calendar down in the basement next to previous year calendars so I can refer back to them if I need. Last year’s calendar contains a lot of useful information, but no longer has the ability to solve my need to organize schedules for this year.
In a very loose way – this is very similar to application retirement. Many larger health plans have existing systems that were created several years (sometimes even several decades) ago. These legacy systems have been customized to reflect the health plan’s very specific business processes. They may be hosted on costly hardware, developed in antiquated software languages and rely on a few developers that are very close to retirement. The cost of supporting these (most likely) antiquated systems can be diverting valuable dollars away from innovation.
The process that I use to move appointment and contact data from one calendar to the next works for me – but is relatively small in scale. Imagine if I was trying to do this for an entire organization without losing context, detail or accuracy!
There are several methodologies for determining the best strategy for your organization to approach software modernization, including:
- Architecture Driven Modernization (ADM) is the initiative to standardize views of the existing systems in order to enable common modernization activities like code analysis and comprehension, and software transformation.
- SABA (Bennett et al., 1999) is a high-level framework for planning the evolution and migration of legacy systems, taking into account both organizational and technical issues.
- SRRT (Economic Model to Software Rewriting and Replacement Times), Chan et al. (1996), Formal model for determining optimal software rewrite and replacement timings based on versatile metrics data.
- And if all else fails: Model Driven Engineering (MDE) is being investigated as an approach for reverse engineering and then forward engineering software code
My calendar migration process evolved over time, your method for software modernization should be well planned prior to the go-live date for the new software system.
I have to admit, I was one of those who saw the movie and found the film humorous to say the least and can see why a desperate regime like North Korea would not want their leader admitting they love margarita’s and Katy Perry. What concerned me about the whole event was whether these unwanted security breaches were now just a fact of life? As a disclaimer, I have no affinity over the downfall of the North Korean government however what transpired was fascinating and amazing that companies like Sony continue to struggle to protect sensitive data despite being one of the largest companies in the world.
According to the Identity Theft Resource Center, there were 761 reported data security breaches in 2014 impacting over 83 million breached records across industries and geographies with B2B and B2C retailers leading the pack with 79.2% of all breaches. Most of these breaches originated through the internet via malicious WORMS and viruses purposely designed to identify and rely back sensitive information including credit card numbers, bank account numbers, and social security information used by criminals to wreak havoc and significant financial losses to merchants and financial institutions. According to the 2014 Ponemon Institute Research study:
- The average cost of cyber-crime per company in the US was $12.7 million this year, according to the Ponemon report, and US companies on average are hit with 122 successful attacks per year.
- Globally, the average annualized cost for the surveyed organizations was $7.6 million per year, ranging from $0.5 million to $61 million per company. Interestingly, small organizations have a higher per-capita cost than large ones ($1,601 versus $437), the report found.
- Some industries incur higher costs in a breach than others, too. Energy and utility organizations incur the priciest attacks ($13.18 million), followed closely by financial services ($12.97 million). Healthcare incurs the fewest expenses ($1.38 million), the report says.
Despite all the media attention around these awful events last year, 2015 does not seem like it’s going to get any better. According to CNBC just this morning, Morgan Stanley reported a data security breach where they had fired an employee who it claims stole account data for hundreds of thousands of its wealth management clients. Stolen information for approximately 900 of those clients was posted online for a brief period of time. With so much to gain from this rich data, businesses across industries have a tough battle ahead of them as criminals are getting more creative and desperate to steal sensitive information for financial gain. According to a Forrester Research, the top 3 breach activities included:
- Inadvertent misuse by insider (36%)
- Loss/theft of corporate asset (32%)
- Phishing (30%)
Given the growth in data volumes fueled by mobile, social, cloud, and electronic payments, the war against data breaches will continue to grow bigger and uglier for firms large and small. As such, Gartner predicts investments in Information Security Solutions will grow further 8.2 percent in 2015 vs. 2014 reaching $76.9+ billion globally. Furthermore, by 2018, more than half of organizations will use security services firms that specialize in data protection, security risk management and security infrastructure management to enhance their security postures.
Like any war, you have to know your enemy and what you are defending. In the war against data breaches, this starts with knowing where your sensitive data is before you can effectively defend against any attack. According to the Ponemon Institute, 18% of firms who were surveyed said they knew where their structured sensitive data was located where as the rest were not sure. 66% revealed that if would not be able to effectively know if they were attacked. Even worse, 47% were NOT confident at having visibility into users accessing sensitive or confidential information and that 48% of those surveyed admitted to a data breach of some kind in the last 12 months.
In closing, the responsibilities of today’s information security professional from Chief Information Security Officers to Security Analysts are challenging and growing each day as criminals become more sophisticated and desperate at getting their hands on one of your most important assets….your data. As your organizations look to invest in new Information Security solutions, make sure you start with solutions that allow you to identify where your sensitive data is to help plan an effective data security strategy both to defend your perimeter and sensitive data at the source. How prepared are you?
For more information about Informatica Data Security Solutions:
If you use production data in test and development environments or are looking for alternative approaches, register for the first webinar in a three part series on data security gaps and remediation. On December 9th, Adrian Lane, Security Analyst at Securosis, will join me to discuss security for test environments.
This is the first webinar in a three part series on data security gaps and remediation. This webinar will focus on how data centric security can be used to shore up vulnerabilities in one of the key focus areas, test and development environments. It’s common practice that non-production database environments are created by making copies of production data. This potentially exposes sensitive and confidential production data to developers, testers, and contractors alike. Commonly, 6-10 copies of production databases are created for each application environment and they are regularly provisioned to support development, testing and training efforts. Since security controls deployed for the source database are not replicated in the test environments, this is a glaring hole in data security and a target for external or internal exploits.
In this webinar, we will cover:
- Key trends in enterprise data security
- Vulnerabilities in non-production application environments (test and development)
- Alternatives to consider when protecting test and development environments
- Priorities for enterprises in reducing attack surface for their organization
- Compliance and internal audit cost reduction
- Data masking and synthetics data use cases
- Informatica Secure Testing capabilities
Register for the webinar today at http://infa.media/1pohKov. If you cannot attend the live event, be sure to watch the webinar on-demand.
The Healthcare and Life Sciences industry has demonstrated its ability to take advantage of data to fuel research, explore new ways to cure life threatening diseases, and save lives. With the adoption of technology innovation especially in the mobile technology segment, this industry will need to find a balance between investments and risk.
ModernMedicine.com published an article in May, 2014 stating how analysts worry that a wide-scale security breach could occur in healthcare and pharmaceuticals industry this year. The piece calls out that this industry category ranked the lowest in an S&P500 cyber health study because of its high volume of incidents and slow response rates.
In the Ponemon Institute’s research, The State of Data Centric Security, respondents from the Healthcare and Life Sciences stated the data they considered most at risk was customer, consumer and patient record data. Intellectual Property, Business Intelligence and Classified Data responses ranked a close second.
In an Informatica webinar with Alan Louie, Research Analyst from IDC Health Insights (@IDCPharmaGuru), we discussed his research on ‘Changing Times in the Life Sciences – Enabled and Empowered by Tech Innovation’. The megatrends of cloud, mobile, social networks and Big Data analytics are all moving in a positive direction with various phases of adoption. Mobile technologies tops the list of IT priorities – likely because of the productivity gains that can be achieved by mobile devices and applications. Security/Risk Management technologies listed as the second-highest priority.
When we asked Security Professionals in Life Sciences in the Ponemon Survey, ‘What keeps you up at night?’, the top answer was ‘migrating to new mobile platforms’. The reason I call this factoid out is that all other industry categories ranked ‘not knowing where sensitive data resides’ as the biggest concern. Why is Life Sciences different from other industries?
One reason could be the intense scrutiny over Intellectual Property protection and HIPPA compliance has already shone a light on where sensitive data reside. Mobile makes it difficult to track and contain a potential breach given that cell phones are the number 1 item left behind in taxi cabs.
With the threat of a major breach on the horizon, and the push to leverage technology such as mobile and cloud, it is evident that the investments in security and risk management need to focus on the data itself – rather than tie it to a specific technology or platform.
Enter Data-Centric Security. The call to action is to consider applying a new approach to the information security paradigm that emphasizes the security of the data itself rather than the security of networks or applications. Informatica recently published an eBook ‘Data-Centric Security eBook New Imperatives for a New Age of Data’. Download it, read it. In an industry with so much at stake, we highlight the need for new security measures such as these. Do you agree?
I encourage your comments and open the dialogue!
A few days ago, I came across a post, 5 C’s of MDM (Case, Content, Connecting, Cleansing, and Controlling), by Peter Krensky, Sr. Research Associate, Aberdeen Group and this response by Alan Duncan with his 5 C’s (Communicate, Co-operate, Collaborate, Cajole and Coerce). I like Alan’s list much better. Even though I work for a product company specializing in information management technology, the secret to successful enterprise information management (EIM) is in tackling the business and organizational issues, not the technology challenges. Fundamentally, data management at the enterprise level is an agreement problem, not a technology problem.
So, here I go with my 5 C’s: (more…)
The information security industry is reporting that more than 1.5 billion (yes, that’s with a “B”) emails and passwords have been hacked. It’s hard to tell from the article, but this could be the big one. (And just when we thought that James Bond had taken care of the Russian mafia.) From both large and small companies, nobody is safe. According to the experts the sites ranged from small e-commerce sites to Fortune 500 companies. At this time the experts aren’t telling us who the big targets were. We could be very unpleasantly surprised.
Most security experts admit that the bulk of the post-breach activity will be email spamming. Insidious to be sure. But imagine if the hackers were to get a little more intelligent about what they have. How many individuals reuse passwords? Experts say over 90% of consumers reuse passwords between popular sites. And since email addresses are the most universally used “user name” on those sites, the chance of that 1.5 billion identities translating into millions of pirated activities is fairly high.
According to the recent published Ponemon study; 24% of respondents don’t know where their sensitive data is stored. That is a staggering amount. Further complicating the issue, the same study notes that 65% of the respondents have no comprehensive data forensics capability. That means that consumers are more than likely to never hear from their provider that their data had been breached. Until it is too late.
So now I guess we all get to go change our passwords again. And we don’t know why, we just have to. This is annoying. But it’s not a permanent fix to have consumers constantly looking over their virtual shoulders. Let’s talk about the enterprise sized firms first. Ponemon indicates that 57% of respondents would like more trained data security personnel to protect data. And the enterprise firm should have the resources to task IT personnel to protect data. They also have the ability to license best in class technology to protect data. There is no excuse not to implement an enterprise data masking technology. This should be used hand in hand with network intrusion defenses to protect from end to end.
Smaller enterprises have similar options. The same data masking technology can be leveraged on smaller scale by a smaller IT organization including the personnel to optimize the infrastructure. Additionally, most small enterprises leverage Cloud based systems that should have the same defenses in place. The small enterprise should bias their buying criteria in data systems for those that implement data masking technology.
Let me add a little fuel to the fire and talk about a different kind of cost. Insurers cover Cyber Risk either as part of a Commercial General Liability policy or as a separate policy. In 2013, insurers paid an average approaching $3.5M for each cyber breach claim. The average per record cost of claims was over $6,000. Now, imagine your enterprise’s slice of those 1.5 billion records. Obviously these are claims, not premiums. Premiums can range up to $40,000 per year for each $1M in coverage. Insurers will typically give discounts for those companies that have demonstrated security practices and infrastructure. I won’t belabor the point, it’s pure math at this point.
There is plenty of risk and cost to go around, to be sure. But there is a way to stay protected with Informatica. And now, let’s all take a few minutes to go change our passwords. I’ll wait right here. There, do you feel better?
For more information on Informatica’s data masking technology click here, where you can drill into dynamic and persistent data masking technology, leading in the industry. So you should still change your passwords…but check out the industry’s leading data security technology after you do.
Today’s data pours into your business from more sources than ever before. It arrives in in all shapes, sizes and colors, changing every department and discipline along the way. Trying to manage it can feel like playing a gazillion simultaneous games of pong.
To get a handle on this, you have to put data first. This requires data-centric mindset, which will will help you:
- Embrace data volume and put it to use
- Fuse data flows in meaningful ways
- Inject data into dynamic business intelligence
- Explode silos and utilize your data assets across all departments and apps
To help with this, we’ve created a guide called The New Data Dynamics. This eBook explains the mindset shift needed to thrive. Discover the new data dynamics and learn:
- Why you need to think “data first,” not just application-centric
- What it looks like to put data first
- How to benefit from data that’s ready for anything, including:
- Combining with new data sources
- Integrating with new applications
- Feeding analytics and business intelligence
Data has changed forever. Embrace the new mindset and exploit the potential of your most powerful business asset yet. Download The New Data Dynamics eBook and learn how.
In response to the growth, organizations seek new ways to unlock the value of their data. Traditionally, data has been analyzed for a few key reasons. First, data was analyzed in order to identify ways to improve operational efficiency. Secondly, data was analyzed to identify opportunities to increase revenue.
As data expands, companies have found new uses for these growing data sets. Of late, organizations have started providing data to partners, who then sell the ‘intelligence’ they glean from within the data. Consider a coffee shop owner whose store doesn’t open until 8 AM. This owner would be interested in learning how many target customers (Perhaps people aged 25 to 45) walk past the closed shop between 6 AM and 8 AM. If this number is high enough, it may make sense to open the store earlier.
As much as organizations prioritize the value of data, customers prioritize the privacy of data. If an organization loses a customer’s data, it results in a several costs to the organization. These costs include:
- Damage to the company’s reputation
- A reduction of customer trust
- Financial costs associated with the investigation of the loss
- Possible governmental fines
- Possible restitution costs
To guard against these risks, data that organizations provide to their partners must be obfuscated. This protects customer privacy. However, data that has been obfuscated is often of a lower value to the partner. For example, if the date of birth of those passing the coffee shop has been obfuscated, the store owner may not be able to determine if those passing by are potential customers. When data is obfuscated without consideration of the analysis that needs to be done, analysis results may not be correct.
There is away to provide data privacy for the customer while simultaneously monetizing enterprise data. To do so, organizations must allow trusted partners to define masking generalizations. With sufficient data masking governance, it is indeed possible for data obfuscation and data value to coexist.
Currently, there is a great deal of research around ensuring that obfuscated data is both protected and useful. Techniques and algorithms like ‘k-Anonymity’ and ‘l-Diversity’ ensure that sensitive data is safe and secure. However, these techniques have have not yet become mainstream. Once they do, the value of big data will be unlocked.