Category Archives: Data Privacy
- PII – Personally Identifiable Information – any data that could potentially identify a specific individual. Any information that can be used to distinguish one person from another and can be used for de-anonymizing anonymous data can be considered PII
- GSA’s Rules of Behavior for Handling Personally Identifiable Information – This directive provides GSA’s policy on how to properly handle PII and the consequences and corrective actions that will be taken if a breach occurs
- PHI – Protected Health Information – any information about health status, provision of health care, or payment for health care that can be lined to a specific individual
- HIPAA Privacy Rule – The HIPAA Privacy Rule establishes national standards to protect individuals’ medical records and other personal health information and applies to health plans, health care clearinghouses, and those health care providers that conduct certain health care transactions electronically. The Rule requires appropriate safeguards to protect the privacy of personal health information, and sets limits and conditions on the uses and disclosures that may be made of such information without patient authorization. The Rule also gives patients rights over their health information, including rights to examine and obtain a copy of their health records, and to request corrections.
- Encryption – a method of protecting data by scrambling it into an unreadable form. It is a systematic encoding process which is only reversible with the right key.
- Tokenization – a method of replacing sensitive data with non-sensitive placeholder tokens. These tokens are swapped with data stored in relational databases and files.
- Data masking – a process that scrambles data, either an entire database or a subset. Unlike encryption, masking is not reversible; unlike tokenization, masked data is useful for limited purposes. There are several types of data masking:
- Static data masking (SDM) masks data in advance of using it. Non production databases masked NOT in real-time.
- Dynamic data masking (DDM) masks production data in real time
- Data Redaction – masks unstructured content (PDF, Word, Excel)
Each of the three methods for protecting data (encryption, tokenization and data masking) have different benefits and work to solve different security issues . We’ll address them in a bit. For a visual representation of the three methods – please see the table below:
For protecting PHI data – encryption is superior to tokenization. You encrypt different portions of personal healthcare data under different encryption keys. Only those with the requisite keys can see the data. This form of encryption requires advanced application support to manage the different data sets to be viewed or updated by different audiences. The key management service must be very scalable to handle even a modest community of users. Record management is particularly complicated. Encryption works better than tokenization for PHI – but it does not scale well.
Properly deployed, encryption is a perfectly suitable tool for protecting PII. It can be set up to protect archived data or data residing on file systems without modification to business processes.
- To protect the data, you must install encryption and key management services to protect the data – this only protects the data from access that circumvents applications
- You can add application layer encryption to protect data in use
- This requires changing applications and databases to support the additional protection
- You will pay the cost of modification and the performance of the application will be impacted
For tokenization of PHI – there are many pieces of data which must be bundled up in different ways for many different audiences. Using the tokenized data requires it to be de-tokenized (which usually includes a decryption process). This introduces an overhead to the process. A person’s medical history is a combination of medical attributes, doctor visits, outsourced visits. It is an entangled set of personal, financial, and medical data. Different groups need access to different subsets. Each audience needs a different slice of the data – but must not see the rest of it. You need to issue a different token for each and every audience. You will need a very sophisticated token management and tracking system to divide up the data, issuing and tracking different tokens for each audience.
Masking can scramble individual data columns in different ways so that the masked data looks like the original (retaining its format and data type) but it is no longer sensitive data. Masking is effective for maintaining aggregate values across an entire database, enabling preservation of sum and average values within a data set, while changing all the individual data elements. Masking plus encryption provide a powerful combination for distribution and sharing of medical information
Traditionally, data masking has been viewed as a technique for solving a test data problem. The December 2014 Gartner Magic Quadrant Report on Data Masking Technology extends the scope of data masking to more broadly include data de-identification in production, non-production, and analytic use cases. The challenge is to do this while retaining business value in the information for consumption and use.
Masked data should be realistic and quasi-real. It should satisfy the same business rules as real data. It is very common to use masked data in test and development environments as the data looks like “real” data, but doesn’t contain any sensitive information.
In an RSA Conference session entitled IAPP: Engineering Privacy: Why Security Isn’t Enough, Sagi Leizerov, E&Y’s Privacy Practice leader began with a plea:
‘We need effective ways to bring together privacy and security controls in an automated way”
Privacy professionals, according to Sagi, essentially need help in determining the use of information – which is a foundational definition of data privacy. Security tools and controls can provide the information necessary to perform that type of investigation conducted by privacy officers. Yet as data proliferates, are the existing security tools truly up for the task?
In other sessions, such as A Privacy Primer for Security Officers , many speakers are claiming that Data Security projects get prioritized as a result of a need to comply with Data Privacy policies and legislation.
We are in an age where data proliferation is one of the major sources of pain for both Chief Information Security Officers and Chief Privacy and Risk Officers (CPO/CRO). Business systems that were designed to automate key business processes store sensitive and private information are primary sources of data for business analytics. As more business users want access data to understand the state of their businesses, data naturally proliferates. Data proliferates to spreadsheets and presentations, emailed in and out of a corporate network, and potentially stored in a public cloud storage offering.
Even though the original intention for using this information was likely all above board, one security violation could potentially open up a can of worms for nefarious characters to take advantage of this data for mal intent. Jeff Northrop, the CTO of the International Association of Privacy Professionals (IAPP) suggests we need to close the gap between security and privacy in a panel discussion with Larry Ponemon, founder of the Ponemon Institute.
Sagi concluded his session by stating ‘Be a voice of change in your organization. Pilot products, be courageous, give new ideas a chance.’ In the recent launch of Informatica Secure@Source, we discuss the need for more alignment between security and privacy teams and the industry seems to agree. Congratulations to the Informatica Secure@Source development team for their recent announcement of winning Gold Medal in the New Product and Service Category at the Info Security Products Guide 2015 Global Excellence Awards!
For more on the importance of Data Security Intelligence in Privacy, watch Larry Ponemon, Founder of the Ponemon Institute and Jeff Northrop, CTO IAPP discuss this topic with Arnold Federbaum, former CISO and Adjunct Professor, NYU, and Linda Hewlett, Sr Enterprise Security Architect, Santander Holdings USA.
If unable to view the video, click here.
When’s the last time you visited your local branch bank and spoke to a human being? How about talking to your banker over the phone? Can’t remember? Well you’re not alone and don’t worry, it’s not a bad thing. The days of operating physical branches with expensive workers to greet and service customers are being replaced with more modern and customer friendly mobile banking applications that allow consumers to deposit checks from the phone, apply for a mortgage and sign closing documents electronically, to eliminating the need to go to an ATM and get physical cash by using mobile payment solutions like Apple Pay. In fact, a new report titled ‘Bricks + Clicks: Building the Digital Branch,’ from Jeanne Capachin and Jim Marous takes an in-depth look at how banks and credit unions are changing their branch and customer channel strategies to meet the demand of today’s digital banking customer.
Why am I talking about this? These market trends are dominating the CEO and CIO agenda in today’s banking industry. I just returned from the 2015 IDC Asian Financial Congress event in Singapore where the digital journey for the next generation bank was a major agenda item. According the IDC Financial Insights, global banks will invest $31.5B USD in core banking modernization to enable these services, improve operational efficiency, and position these banks to better compete on technology and convenience across markets. Core banking modernization initiatives are complex, costly, and fraught with risks. Let’s take a closer look. (more…)
Last week was Informatica’s first ever Data Mania event, held at the Contemporary Jewish Museum in San Francisco. We had an A-list lineup of speakers from leading cloud and data companies, such as Salesforce, Amazon Web Services (AWS), Tableau, Dun & Bradstreet, Marketo, AppDynamics, Birst, Adobe, and Qlik. The event and speakers covered a range of topics all related to data, including Big Data processing in the cloud, data-driven customer success, and cloud analytics.
While these companies are giants today in the world of cloud and have created their own unique ecosystems, we also wanted to take a peek at and hear from the leaders of tomorrow. Before startups can become market leaders in their own realm, they face the challenge of ramping up a stellar roster of customers so that they can get to subsequent rounds of venture funding. But what gets in their way are the numerous data integration challenges of onboarding customer data onto their software platform. When these challenges remain unaddressed, R&D resources are spent on professional services instead of building value-differentiating IP. Bugs also continue to mount, and technical debt increases.
Enter the Informatica Cloud Connector SDK. Built entirely in Java and able to browse through any cloud application’s API, the Cloud Connector SDK parses the metadata behind each data object and presents it in the context of what a business user should see. We had four startups build a native connector to their application in less than two weeks: BigML, Databricks, FollowAnalytics, and ThoughtSpot. Let’s take a look at each one of them.
With predictive analytics becoming a growing imperative, machine-learning algorithms that can have a higher probability of prediction are also becoming increasingly important. BigML provides an intuitive yet powerful machine-learning platform for actionable and consumable predictive analytics. Watch their demo on how they used Informatica Cloud’s Connector SDK to help them better predict customer churn.
Can’t play the video? Click here, http://youtu.be/lop7m9IH2aw
Databricks was founded out of the UC Berkeley AMPLab by the creators of Apache Spark. Databricks Cloud is a hosted end-to-end data platform powered by Spark. It enables organizations to unlock the value of their data, seamlessly transitioning from data ingest through exploration and production. Watch their demo that showcases how the Informatica Cloud connector for Databricks Cloud was used to analyze lead contact rates in Salesforce, and also performing machine learning on a dataset built using either Scala or Python.
Can’t play the video? Click here, http://youtu.be/607ugvhzVnY
With mobile usage growing by leaps and bounds, the area of customer engagement on a mobile app has become a fertile area for marketers. Marketers are charged with acquiring new customers, increasing customer loyalty and driving new revenue streams. But without the technological infrastructure to back them up, their efforts are in vain. FollowAnalytics is a mobile analytics and marketing automation platform for the enterprise that helps companies better understand audience engagement on their mobile apps. Watch this demo where FollowAnalytics first builds a completely native connector to its mobile analytics platform using the Informatica Cloud Connector SDK and then connects it to Microsoft Dynamics CRM Online using Informatica Cloud’s prebuilt connector for it. Then, see FollowAnalytics go one step further by performing even deeper analytics on their engagement data using Informatica Cloud’s prebuilt connector for Salesforce Wave Analytics Cloud.
Can’t play the video? Click here, http://youtu.be/E568vxZ2LAg
Analytics has taken center stage this year due to the rise in cloud applications, but most of the existing BI tools out there still stick to the old way of doing BI. ThoughtSpot brings a consumer-like simplicity to the world of BI by allowing users to search for the information they’re looking for just as if they were using a search engine like Google. Watch this demo where ThoughtSpot uses Informatica Cloud’s vast library of over 100 native connectors to move data into the ThoughtSpot appliance.
Can’t play the video? Click here, http://youtu.be/6gJD6hRD9h4
The International Association of Privacy Professionals (IAPP) held its Global Privacy Summit in Washington DC March 4-6. The topic of Data-Centric Security was presented by Informatica’s Robert Shields, Product Marketing, Data Security Group. Here is a quick recap of the conversation in case you missed it.
In an age of the massive data breach, there is agreement between security and privacy professionals that we must redefine privacy policies and controls. What we are doing is just not working effectively. Network, Host and Endpoint Security needs to be strengthened by Data-Centric Security approaches. The focus needs to be on using data security controls such that they can be enforced no matter where sensitive or confidential data proliferates.
Data-Centric Security does not mean ‘encrypt it all’. That is completely impractical and introduces unnecessary cost and complexities. The approach can be simplified into four categorical steps: 1. Classify it, 2. Find it, 3. Assess its risk, 4. Protect it.
1. Classify it.
The idea behind Data-Centric Security is that based on policy, an enterprise defines its classifications of what is sensitive and confidential then apply controls to that set of data. For example, if the only classified and sensitive data that you store in your enterprise is employee data, than focus on just employee data. No need to boil the ocean in that case. However, if you have several data domains of sensitive and confidential data, you need to know where it resides and assess its risk to help prioritize your moves.
2. Find it.
Discover where in your enterprise sensitive and classified data reside. This means looking at how data is proliferating from its source to multiple targets – and not just copies made for backup and disaster recovery purposes.
For example, if you have a data warehouse where sensitive and confidential data is being loaded through a transformation process, the data is still considered classified or sensitive, but its shape or form may have changed. You also need to know when data leaves the firewall it becomes available to view on a mobile device, or accessible by a remote team, such as offshore development and support teams.
3.Assess its risk.
Next, you need to be able to assess the data risk based the number of users who may have access to the data and where those users are physically located and based on existing security controls that may already exist. If large volumes of sensitive data is potentially being exposed to a large population in another country, you might want to consider this data more at risk than a few number of records that are encrypted residing in your protected data center. That helps you prioritize where to start implementing controls to maximize the return on your efforts.
4. Protect it.
Once you have a sense of prioritization, you can then apply the appropriate, cost effective controls that aligns with its level of risk. Place monitoring tools around the sensitive data and detect when usage patterns become unusual. Train on normal user behavior and then initiate an alert to recommend a change to the application of a control.
In a world where policies are defined and enforced based on data privacy regulations and standards, it only makes sense to align the right intelligence and controls to ensure proper enforcement. In reality these four steps are complex and they do require cross-functional teams to come together and agree on a strategy.
Valentine’s Day is such a strange holiday. It always seems to bring up more questions than answers. And the internet always seems to have a quiz to find out the answer! There’s the “Does he have a crush on you too – 10 simple ways to find out” quiz. There’s the “What special gift should I get her this Valentine’s Day?” quiz. And the ever popular “Why am I still single on Valentine’s Day?” quiz.
Well Marketers, it’s your lucky Valentine’s Day! We have a quiz for you too! It’s about your relationship with data. Where do you stand? Are you ready to take the next step?
Question 1: Do you connect – I mean, really connect – with your data?
□ (A) Not really. We just can’t seem to get it together and really connect.
□ (B) Sometimes. We connect on some levels, but there are big gaps.
□ (C) Most of the time. We usually connect, but we miss out on some things.
□ (D) We are a perfect match! We connect about everything, no matter where, no matter when.
Translation: Data ready marketers have access to the best possible data, no matter what form it is in, no matter what system it is in. They are able to make decisions based everything the entire organization “knows” about their customer/partner/product – with a complete 360 degree view. And they are also able to connect to and integrate with data outside the bounds of their organization to achieve the sought-after 720 degree view. They can integrate and react to social media comments, trends, and feedback – in real time – and to match it with an existing record whenever possible. And they can quickly and easily bring together any third party data sources they may need.
Question 2: How good looking & clean is you data?
□ (A) Yikes, not very. But it’s what’s on the inside that counts right?
□ (B) It’s ok. We’ve both let ourselves go a bit.
□ (C) It’s pretty cute. Not supermodel hot, but definitely girl or boy next door cute.
□ (D) My data is HOT! It’s perfect in every way!
Translation: Marketers need data that is reliable and clean. According to a recent Experian study, American companies believe that 25% of their data is inaccurate, the rest of the world isn’t much more confident. 90% of respondents said they suffer from common data errors, and 78% have problems with the quality of the data they gather from disparate channels. Making marketing decisions based upon data that is inaccurate leads to poor decisions. And what’s worse, many marketers have no idea how good or bad their data is, so they have no idea what impact it is having on their marketing programs and analysis. The data ready marketer understands this and has a top tier data quality solution in place to make sure their data is in the best shape possible.
Question 3: Do you feel safe when you’re with your data?
□ (A) No, my data is pretty scary. 911 is on speed dial.
□ (B) I’m not sure actually. I think so?
□ (C) My date is mostly safe, but it’s got a little “bad boy” or “bad girl” streak.
□ (D) I protect my data, and it protects me back. We keep each other safe and secure.
Translation: Marketers need to be able to trust the quality of their data, but they also need to trust the security of their data. Is it protected or is it susceptible to theft and nefarious attacks like the ones that have been all over the news lately? Nothing keeps a CMO and their PR team up at night like worrying they are going to be the next brand on the cover of a magazine for losing millions of personal customer records. But beyond a high profile data breach, marketers need to be concerned over data privacy. Are you treating customer data in the way that is expected and demanded? Are you using protected data in your marketing practices that you really shouldn’t be? Are you marketing to people on excluded lists
Question 4: Is your data adventurous and well-traveled, or is it more of a “home-body”?
□ (A) My data is all over the place and it’s impossible to find.
□ (B) My data is all in one place. I know we’re missing out on fun and exciting options, but it’s just easier this way.
□ (C) My data is in a few places and I keep fairly good tabs on it. We can find each other when we need to, but it takes some effort.
□ (D) My data is everywhere, but I have complete faith that I can get ahold of any source I might need, when and where I need it.
Translation: Marketing data is everywhere. Your marketing data warehouse, your CRM system, your marketing automation system. It’s throughout your organization in finance, customer support, and sale systems. It’s in third party systems like social media and data aggregators. That means it’s in the cloud, it’s on premise, and everywhere in between. Marketers need to be able to get to and integrate data no matter where it “lives”.
Question 5: Does your data take forever to get ready when it’s time to go do so something together?
□ (A) It takes forever to prepare my data for each new outing. It’s definitely not “ready to go”.
□ (B) My data takes it’s time to get ready, but it’s worth the wait… usually!
□ (C) My data is fairly quick to get ready, but it does take a little time and effort.
□ (D) My data is always ready to go, whenever we need to go somewhere or do something.
Translation: One of the reasons many marketers end up in marketing is because it is fast paced and every day is different. Nothing is the same from day-to-day, so you need to be ready to act at a moment’s notice, and change course on a dime. Data ready marketers have a foundation of great data that they can point at any given problem, at any given time, without a lot of work to prepare it. If it is taking you weeks or even days to pull data together to analyze something new or test out a new hunch, it’s too late – your competitors have already done it!
Question 6: Can you believe the stories your data is telling you?
□ (A) My data is wrong a lot. It stretches the truth a lot, and I cannot rely on it.
□ (B) I really don’t know. I question these stories – dare I say excused – but haven’t been able to prove it one way or the other.
□ (C) I believe what my data says most of the time. It rarely lets me down.
□ (D) My data is very trustworthy. I believe it implicitly because we’ve earned each other’s trust.
Translation: If your data is dirty, inaccurate, and/or incomplete, it is essentially “lying” to you. And if you cannot get to all of the data sources you need, your data is telling you “white lies”! All of the work you’re putting into analysis and optimization is based on questionable data, and is giving you questionable results. Data ready marketers understand this and ensure their data is clean, safe, and connected at all times.
Question 7: Does your data help you around the house with your daily chores?
□ (A) My data just sits around on the couch watching TV.
□ (B) When I nag my data will help out occasionally.
□ (C) My data is pretty good about helping out. It doesn’t take imitative, but it helps out whenever I ask.
□ (D) My data is amazing. It helps out whenever it can, however it can, even without being asked.
Translation: Your marketing data can do so much. It should enable you be “customer ready” – helping you to understand everything there is to know about your customers so you can design amazing personalized campaigns that speak directly to them. It should enable you to be “decision ready” – powering your analytics capabilities with great data so you can make great decisions and optimize your processes. But it should also enable you to be “showcase ready” – giving you the proof points to demonstrate marketing’s actual impact on the bottom line.
Now for the fun part… It’s time to rate your data relationship status
If you answered mostly (A): You have a rocky relationship with your data. You may need some data counseling!
If you answered mostly (B): It’s time to decide if you want this data relationship to work. There’s hope, but you’ve got some work to do.
If you answered mostly (C): You and your data are at the beginning of a beautiful love affair. Keep working at it because you’re getting close!
If you answered mostly (D): Congratulations, you have a strong data marriage that is based on clean, safe, and connected data. You are making great business decisions because you are a data ready marketer!
Do You Love Your Data?
No matter what your data relationship status, we’d love to hear from you. Please take our survey about your use of data and technology. The results are coming out soon so don’t miss your chance to be a part. https://www.surveymonkey.com/s/DataMktg
Also, follow me on twitter – The Data Ready Marketer – for some of the latest & greatest news and insights on the world of data ready marketing. And stay tuned because we have several new Data Ready Marketing pieces coming out soon – InfoGraphics, eBooks, SlideShares, and more!
Data proliferation has traditionally been measured based on the number of copies data reside on different media. For example, if data residing on an enterprise storage device was backed up to tape, the proliferation was measured by the number of tapes the same piece of data would reside. Now that backups are no longer restricted to the data center and data is no longer constrained by the originating application, this definition is due for an update.
Data proliferation should be measured based on the number of users who have access to or can view the data and that data proliferation is a primary factor in measuring the risk of a data breach. My argument here is that as sensitive, confidential or private data proliferates beyond the original copy, it increases its surface area and proportionally increases its risk of a data breach.
Using the original definition of data proliferation and an example of data storage shown below, data proliferation would include production, production copies used for disaster recovery purposes and all physical backup copies. But as you can see, data is also copied to test environments for development purposes. When factoring in the number of privileged users with access to those copies, you have a different view of proliferation and potential risk.
In the example, there are potentially thousands of copies of sensitive data but only a small number of users who are authorized to access the data.
In the case of test and development, this image highlights a potentially high area of risk because the number of users who could see the sensitive data is high.
Similarly with online advertising, the measure of how many people see an online ad is called an impression. If an ad was seen by 100 online users, it would have 100 impressions.
When you apply that same principal to data security, you could say that data proliferation is a calculation of the number of copies of a data element multiplied by the potential number of users who could physically view the data, or in other words ‘impressions’. In this second image below, rather than considering the total number of copies, what if we measured risk based on the total number of impressions?
In this case, the measure of risk is independent of the physical media the data reside on. You could take this a few steps further and add a factor based on security controls in place to prevent unauthorized access.
This week, another reputable organization, Anthem Inc, reported it was ‘the target of a very sophisticated external cyber attack’. But rather than be upset at Anthem, I respect their responsible data breach reporting.
In this post from Joseph R. Swedish, President and CEO, Anthem, Inc., does something that I believe all CEO’s should do in this situation. He is straight up about what happened, what information was breached, actions they took to plug the security hole, and services available to those impacted.
When it comes to a data breach, the worst thing you can do is ignore it or hope it will go away. This was not the case with Anthem. Mr Swedish did the right thing and I appreciate it.
You only have one corporate reputation – and it is typically aligned with the CEO’s reputation. When the CEO talks about the details of a data breach and empathizes with those impacted, he establishes a dialogue based on transparency and accountability.
Research that tells us 44% of healthcare and pharmaceutical organizations experienced a breach in 2014. And we know that when personal information when combined with health information is worth more on the black market because the data can be used for insurance fraud. I expect more healthcare providers will be on the defensive this year and only hope that they follow Mr Swedish’s example when facing the music.
Patient experience is key to growth and success for all health delivery organizations. Gartner has stated that the patient experience needs to be one of the highest priorities for organizations. The quality of your data is critical to achieving that goal. My recent experience with my physician’s office demonstrates how easy it is for the quality of data to influence the patient experience and undermine a patient’s trust in their physician and the organization with which they are interacting.
I have a great relationship with my doctor and have always been impressed by the efficiency of the office. I never wait beyond my appointment time, the care is excellent and the staff is friendly and professional. There is an online tool that allows me to see my records, send messages to my doctor, request an appointment and get test results. The organization enjoys the highest reputation for clinical quality. Pretty much perfect from my perspective – until now.
I needed to change a scheduled appointment due to a business conflict. Since I expected some negotiation I decided to make the phone call rather than request it on line…there are still transactions for which human to human is optimal! I had all my information at hand and made the call. The phone was pleasantly answered and the request given. The receptionist requested my name and date of birth, but then stated that I did not have a future appointment. I am looking at the online tool, which clearly states that I am scheduled for February 17 at 8:30 AM. The pleasant young woman confirms my name, date of birth and address and then tells me that I do not have an appointment scheduled. I am reasonably savvy about these things and figured out the core problem, which is that my last name is hyphenated. Armed with that information, my other record is found and a new appointment scheduled. The transaction is completed.
But now I am worried. My name has been like this for many years and none of my other key data has changed. Are there parts of my clinical history missing in the record that my doctor is using? Will that have a negative impact on the quality of my care? If I were to be unable to clearly respond, might that older record be accessed and my current medications and history not be available? The receptionist did not address the duplicate issue clearly by telling me that she would attend to merging the records, so I have no reason to believe that she will. My confidence is now shaken and I am less trustful of the system and how well it will serve me going forward. I have resolved my issue, but not everyone would be able to push back to insure that their records are now accurate.
Many millions of dollars are being spent on electronic health records. Many more millions are being spent to redesign work flow to accommodate the new EHR’s. Physicians and other clinicians are learning new ways to access data and treat their patients. The foundation for all of this is accurate data. Nicely displayed but inaccurate data will not result in improved care or enhanced member experience. As healthcare organizations move forward with the razzle dazzle of new systems they need to remember the basics of good quality data and insure that it is available to these new applications.