Category Archives: Governance, Risk and Compliance
- PII – Personally Identifiable Information – any data that could potentially identify a specific individual. Any information that can be used to distinguish one person from another and can be used for de-anonymizing anonymous data can be considered PII
- GSA’s Rules of Behavior for Handling Personally Identifiable Information – This directive provides GSA’s policy on how to properly handle PII and the consequences and corrective actions that will be taken if a breach occurs
- PHI – Protected Health Information – any information about health status, provision of health care, or payment for health care that can be lined to a specific individual
- HIPAA Privacy Rule – The HIPAA Privacy Rule establishes national standards to protect individuals’ medical records and other personal health information and applies to health plans, health care clearinghouses, and those health care providers that conduct certain health care transactions electronically. The Rule requires appropriate safeguards to protect the privacy of personal health information, and sets limits and conditions on the uses and disclosures that may be made of such information without patient authorization. The Rule also gives patients rights over their health information, including rights to examine and obtain a copy of their health records, and to request corrections.
- Encryption – a method of protecting data by scrambling it into an unreadable form. It is a systematic encoding process which is only reversible with the right key.
- Tokenization – a method of replacing sensitive data with non-sensitive placeholder tokens. These tokens are swapped with data stored in relational databases and files.
- Data masking – a process that scrambles data, either an entire database or a subset. Unlike encryption, masking is not reversible; unlike tokenization, masked data is useful for limited purposes. There are several types of data masking:
- Static data masking (SDM) masks data in advance of using it. Non production databases masked NOT in real-time.
- Dynamic data masking (DDM) masks production data in real time
- Data Redaction – masks unstructured content (PDF, Word, Excel)
Each of the three methods for protecting data (encryption, tokenization and data masking) have different benefits and work to solve different security issues . We’ll address them in a bit. For a visual representation of the three methods – please see the table below:
For protecting PHI data – encryption is superior to tokenization. You encrypt different portions of personal healthcare data under different encryption keys. Only those with the requisite keys can see the data. This form of encryption requires advanced application support to manage the different data sets to be viewed or updated by different audiences. The key management service must be very scalable to handle even a modest community of users. Record management is particularly complicated. Encryption works better than tokenization for PHI – but it does not scale well.
Properly deployed, encryption is a perfectly suitable tool for protecting PII. It can be set up to protect archived data or data residing on file systems without modification to business processes.
- To protect the data, you must install encryption and key management services to protect the data – this only protects the data from access that circumvents applications
- You can add application layer encryption to protect data in use
- This requires changing applications and databases to support the additional protection
- You will pay the cost of modification and the performance of the application will be impacted
For tokenization of PHI – there are many pieces of data which must be bundled up in different ways for many different audiences. Using the tokenized data requires it to be de-tokenized (which usually includes a decryption process). This introduces an overhead to the process. A person’s medical history is a combination of medical attributes, doctor visits, outsourced visits. It is an entangled set of personal, financial, and medical data. Different groups need access to different subsets. Each audience needs a different slice of the data – but must not see the rest of it. You need to issue a different token for each and every audience. You will need a very sophisticated token management and tracking system to divide up the data, issuing and tracking different tokens for each audience.
Masking can scramble individual data columns in different ways so that the masked data looks like the original (retaining its format and data type) but it is no longer sensitive data. Masking is effective for maintaining aggregate values across an entire database, enabling preservation of sum and average values within a data set, while changing all the individual data elements. Masking plus encryption provide a powerful combination for distribution and sharing of medical information
Traditionally, data masking has been viewed as a technique for solving a test data problem. The December 2014 Gartner Magic Quadrant Report on Data Masking Technology extends the scope of data masking to more broadly include data de-identification in production, non-production, and analytic use cases. The challenge is to do this while retaining business value in the information for consumption and use.
Masked data should be realistic and quasi-real. It should satisfy the same business rules as real data. It is very common to use masked data in test and development environments as the data looks like “real” data, but doesn’t contain any sensitive information.
In an RSA Conference session entitled IAPP: Engineering Privacy: Why Security Isn’t Enough, Sagi Leizerov, E&Y’s Privacy Practice leader began with a plea:
‘We need effective ways to bring together privacy and security controls in an automated way”
Privacy professionals, according to Sagi, essentially need help in determining the use of information – which is a foundational definition of data privacy. Security tools and controls can provide the information necessary to perform that type of investigation conducted by privacy officers. Yet as data proliferates, are the existing security tools truly up for the task?
In other sessions, such as A Privacy Primer for Security Officers , many speakers are claiming that Data Security projects get prioritized as a result of a need to comply with Data Privacy policies and legislation.
We are in an age where data proliferation is one of the major sources of pain for both Chief Information Security Officers and Chief Privacy and Risk Officers (CPO/CRO). Business systems that were designed to automate key business processes store sensitive and private information are primary sources of data for business analytics. As more business users want access data to understand the state of their businesses, data naturally proliferates. Data proliferates to spreadsheets and presentations, emailed in and out of a corporate network, and potentially stored in a public cloud storage offering.
Even though the original intention for using this information was likely all above board, one security violation could potentially open up a can of worms for nefarious characters to take advantage of this data for mal intent. Jeff Northrop, the CTO of the International Association of Privacy Professionals (IAPP) suggests we need to close the gap between security and privacy in a panel discussion with Larry Ponemon, founder of the Ponemon Institute.
Sagi concluded his session by stating ‘Be a voice of change in your organization. Pilot products, be courageous, give new ideas a chance.’ In the recent launch of Informatica Secure@Source, we discuss the need for more alignment between security and privacy teams and the industry seems to agree. Congratulations to the Informatica Secure@Source development team for their recent announcement of winning Gold Medal in the New Product and Service Category at the Info Security Products Guide 2015 Global Excellence Awards!
For more on the importance of Data Security Intelligence in Privacy, watch Larry Ponemon, Founder of the Ponemon Institute and Jeff Northrop, CTO IAPP discuss this topic with Arnold Federbaum, former CISO and Adjunct Professor, NYU, and Linda Hewlett, Sr Enterprise Security Architect, Santander Holdings USA.
If unable to view the video, click here.
The International Association of Privacy Professionals (IAPP) held its Global Privacy Summit in Washington DC March 4-6. The topic of Data-Centric Security was presented by Informatica’s Robert Shields, Product Marketing, Data Security Group. Here is a quick recap of the conversation in case you missed it.
In an age of the massive data breach, there is agreement between security and privacy professionals that we must redefine privacy policies and controls. What we are doing is just not working effectively. Network, Host and Endpoint Security needs to be strengthened by Data-Centric Security approaches. The focus needs to be on using data security controls such that they can be enforced no matter where sensitive or confidential data proliferates.
Data-Centric Security does not mean ‘encrypt it all’. That is completely impractical and introduces unnecessary cost and complexities. The approach can be simplified into four categorical steps: 1. Classify it, 2. Find it, 3. Assess its risk, 4. Protect it.
1. Classify it.
The idea behind Data-Centric Security is that based on policy, an enterprise defines its classifications of what is sensitive and confidential then apply controls to that set of data. For example, if the only classified and sensitive data that you store in your enterprise is employee data, than focus on just employee data. No need to boil the ocean in that case. However, if you have several data domains of sensitive and confidential data, you need to know where it resides and assess its risk to help prioritize your moves.
2. Find it.
Discover where in your enterprise sensitive and classified data reside. This means looking at how data is proliferating from its source to multiple targets – and not just copies made for backup and disaster recovery purposes.
For example, if you have a data warehouse where sensitive and confidential data is being loaded through a transformation process, the data is still considered classified or sensitive, but its shape or form may have changed. You also need to know when data leaves the firewall it becomes available to view on a mobile device, or accessible by a remote team, such as offshore development and support teams.
3.Assess its risk.
Next, you need to be able to assess the data risk based the number of users who may have access to the data and where those users are physically located and based on existing security controls that may already exist. If large volumes of sensitive data is potentially being exposed to a large population in another country, you might want to consider this data more at risk than a few number of records that are encrypted residing in your protected data center. That helps you prioritize where to start implementing controls to maximize the return on your efforts.
4. Protect it.
Once you have a sense of prioritization, you can then apply the appropriate, cost effective controls that aligns with its level of risk. Place monitoring tools around the sensitive data and detect when usage patterns become unusual. Train on normal user behavior and then initiate an alert to recommend a change to the application of a control.
In a world where policies are defined and enforced based on data privacy regulations and standards, it only makes sense to align the right intelligence and controls to ensure proper enforcement. In reality these four steps are complex and they do require cross-functional teams to come together and agree on a strategy.
After a careful review by Informatica, the recent Ghost buffer overflow vulnerability (CVE-2015-0235) does not require any Informatica patches for our on-premise products. All Informatica cloud-hosted services were patched by Jan 30.
What you need to know
Ghost is a buffer overflow vulnerability found in glibc (GNU C Library), most commonly found on Linux systems. All distributions of Linux are potentially affected. The most common attack vectors involve Linux servers that are hosting web apps, email servers, and other such services that accept requests over the open Internet; hackers can embed malicious code therein. Fixed versions of glibc are now already available from their respective Linux vendors, including:
- Red Hat: https://access.redhat.com/articles/1332213
What you need to do
Because many of our products link to glibc.zip, we recommend customers apply the appropriate OS patch from their Linux vendor. After applying this OS patch, customers should restart Informatica services running on that machine to ensure our software is linking to the up-to-date glibc library. To ensure all other resources on a system are patched, a full system reboot may also be necessary.
Bill Burns, VP & Chief Information Security Officer
I’ve spent most of my career working with new technology, most recently helping companies make sense of mountains of incoming data. This means, as I like to tell people, that I have the sexiest job in the 21st century.
Harvard Business Review put the data scientist into the national spotlight in their publication Data Scientist: The Sexiest Job of the 21st Century. Job trends data from Indeed.com confirms the rise in popularity for the position, showing that the number of job postings for data scientist positions increased by 15,000%.
In the meantime, the role of data scientist has changed dramatically. Data used to reside on the fringes of the operation. It was usually important but seldom vital – a dreary task reserved for the geekiest of the geeks. It supported every function but never seemed to lead them. Even the executives who respected it never quite absorbed it.
For every Big Data problem, the solution often rests on the shoulders of a data scientist. The role of the data scientist is similar in responsibility to the Wall Street “quants” of the 80s and 90s – now, these data experienced are tasked with the management of databases previously thought too hard to handle, and too unstructured to derive any value.
So, is it the sexiest job of the 21st Century?
Think of a data scientist more like the business analyst-plus, part mathematician, part business strategist, these statistical savants are able to apply their background in mathematics to help companies tame their data dragons. But these individuals aren’t just math geeks, per se.
A data scientist is somebody who is inquisitive, who can stare at data and spot trends. It’s almost like a renaissance individual who really wants to learn and bring change to an organization.
If this sounds like you, the good news is demand for data scientists is far outstripping supply. Nonetheless, with the rising popularity of the data scientist – not to mention the companies that are hiring for these positions – you have to be at the top of your field to get the jobs.
Companies look to build teams around data scientists that ask the most questions about:
- How the business works
- How it collects its data
- How it intends to use this data
- What it hopes to achieve from these analyses
These questions were important because data scientists will often unearth information that can “reshape an entire company.” Obtaining a better understanding of the business’ underpinnings not only directs the data scientist’s research, but helps them present the findings and communicate with the less-analytical executives within the organization.
While it’s important to understand your own business, learning about the successes of other corporations will help a data scientist in their current job–and the next.
In our house when we paint a room, my husband does the big rolling of the walls or ceiling, I do the cut-in work. I am good at prepping the room, taping all the trim and deliberately painting the corners. However, I am thrifty and constantly concerned that we won’t have enough paint to finish a room. My husband isn’t afraid to use enough paint and is extremely efficient at painting a wall in a single even coat. As a result, I don’t do the big rolling and he doesn’t do the cutting in. It took us awhile to figure this out, and a few rooms had to be repainted while we were figuring it out. Now we know what we are good at, and what we need help with.
Payers roles are changing. Payers were previously focused on risk assessment, setting and collecting premiums, analyzing claims and making payments – all while optimizing revenues. Payers are pretty good at selling to employers, figuring out the cost/benefit ratio from an employers perspective and ensuring a good, profitable product. With the advent of the Affordable Healthcare Act along with a much more transient insured population, payers now must focus more on the individual insured and be able to communicate with the individuals in a more nimble manner than in the past.
Individual members will shop for insurance based on consumer feedback and price. They are interested in ease of enrollment and the ability to submit and substantiate claims quickly and intuitively. Payers are discovering that they need to help manage population health at a individual member level. And population health management requires less of a business-data analytics approach and more social media and gaming-style logic to understand patients. In this way, payers can help develop interventions to sustain behavioral changes for better health.
When designing such analytics, payers should consider the following key design steps:
- Extend data warehouses to an analytics appliance
- Invest in a big data platform to absorb patients’ social data
- Build predictive analytics for patient behavior
- Bridge collaborative and behavioral analytics with claims to build revenue and profitability
Due to payers’ mature predictive analytics competencies, they will have a much easier time in the next generation of population behavior compared to their provider counterparts. As clinical content is often unstructured compared to the claims data, payers need to pay extra attention to context and semantics when deciphering clinical content submitted by providers. Payers can use help from vendors that can help them understand unstructured data, individual members. They can then use that data to create fantastic predictive analytic solutions.
This week, another reputable organization, Anthem Inc, reported it was ‘the target of a very sophisticated external cyber attack’. But rather than be upset at Anthem, I respect their responsible data breach reporting.
In this post from Joseph R. Swedish, President and CEO, Anthem, Inc., does something that I believe all CEO’s should do in this situation. He is straight up about what happened, what information was breached, actions they took to plug the security hole, and services available to those impacted.
When it comes to a data breach, the worst thing you can do is ignore it or hope it will go away. This was not the case with Anthem. Mr Swedish did the right thing and I appreciate it.
You only have one corporate reputation – and it is typically aligned with the CEO’s reputation. When the CEO talks about the details of a data breach and empathizes with those impacted, he establishes a dialogue based on transparency and accountability.
Research that tells us 44% of healthcare and pharmaceutical organizations experienced a breach in 2014. And we know that when personal information when combined with health information is worth more on the black market because the data can be used for insurance fraud. I expect more healthcare providers will be on the defensive this year and only hope that they follow Mr Swedish’s example when facing the music.
To level set, let’s make sure you understand my definition of dark data. I prefer using visualizations when I can so, picture this: the end of the first Indiana Jones movie, Raiders of the Lost Ark. In this scene, we see the Ark of the Covenant, stored in a generic container, being moved down the aisle in a massive warehouse full of other generic containers. What’s in all those containers? It’s pretty much anyone’s guess. There may be a record somewhere, but, for all intents and purposes, the materials stored in those boxes are useless.
Applying this to data, once a piece of data gets shoved into some generic container and is stored away, just like the Arc, the data becomes essentially worthless. This is dark data.
Opening up a government agency to all its dark data can have significant impacts, both positive and negative. Here are couple initial tips to get you thinking in the right direction:
- Begin with the end in mind – identify quantitative business benefits of exposing certain dark data.
- Determine what’s truly available – perform a discovery project – seek out data hidden in the corners of your agency – databases, documents, operational systems, live streams, logs, etc.
- Create an extraction plan – determine how you will get access to the data, how often does the data update, how will handle varied formats?
- Ingest the data – transform the data if needed, integrate if needed, capture as much metadata as possible (never assume you won’t need a metadata field, that’s just about the time you will be proven wrong).
- Govern the data – establish standards for quality, access controls, security protections, semantic consistency, etc. – don’t skimp here, the impact of bad data can never really be quantified.
- Store it – it’s interesting how often agencies think this is the first step
- Get the data ready to be useful to people, tools and applications – think about how to minimalize the need for users to manipulate data – reformatting, parsing, filtering, etc. – to better enable self-service.
- Make it available – at this point, the data should be easily accessible, easily discoverable, easily used by people, tools and applications.
Clearly, there’s more to shining the light on dark data than I can offer in this post. If you’d like to take the next step to learning what is possible, I suggest you download the eBook, The Dark Data Imperative.
I live in a very small town in Maine. I don’t spend a lot of time thinking about my privacy. Some would say that by living in a small town, you give up your right to privacy because everyone knows what everyone else is doing. Living here is a choice – for me to improve my family’s quality of life. Sharing all of the details of my life – not so much.
When I go to my doctor (who also happens to be a parent from my daughter’s school), I fully expect that any sort of information that I share with him, or that he obtains as a result of lab tests or interviews, or care that he provides is not available for anyone to view. On the flip side, I want researchers to be able to take my lab information combined with my health history in order to do research on the effectiveness of certain medications or treatment plans.
As a result of this dichotomy, Congress (in 1996) started to address governance regarding the transmission of this type of data. The Health Insurance Portability and Accountability Act of 1996 (HIPAA) is a Federal law that sets national standards for how health care plans, health care clearinghouses, and most health care providers protect the privacy of a patient’s health information. With certain exceptions, the Privacy Rule protects a subset of individually identifiable health information, known as protected health information or PHI, that is held or maintained by covered entities or their business associates acting for the covered entity. PHI is any information held by a covered entity which concerns health status, provision of health care, or payment for health care that can be linked to an individual.
Many payers have this type of data in their systems (perhaps in a Claims Administration system), and have the need to share data between organizational entities. Do you know if PHI data is being shared outside of the originating system? Do you know if PHI is available to resources that have no necessity to access this information? Do you know if PHI data is being shared outside your organization?
If you can answer yes to each of these questions – fantastic. You are well ahead of the curve. If not – you need to start considering solutions that can
- Identify PHI in all of your data streams
- Monitor and track the flow of this data throughout your organization and
- Mask this data if it is being shared with resources that don’t need to be able to identify the individual.
I want to researchers to have access to medically relevant data so they can find the cures to some horrific diseases. I want to feel comfortable sharing health information with my doctor. I want to feel comfortable that my health insurance company is respecting my privacy. Now to get my kids to stop oversharing.