Tag Archives: Big Data and Privacy
I just finished reading a great article from one of my former colleagues, Bill Franks. He makes a strong argument that Big Data is not inherently good or evil anymore than money is. What makes Big Data (or any data as I see it) take on a characteristic of good or evil is how it is used. Same as money, right? Here’s the rest of Bill’s article.
Bill framed his thoughts within the context of a discussion with a group of government legislators who I would characterize based on his commentary as a bit skittish of government collecting Big Data. Given many recent headlines, I sincerely do not blame them for being concerned. In fact, I applaud them for being cautious.
At the same time, while Big Data seems to be the “type” of data everyone wants to speak about, the scope of the potential problem extends to ALL data. Just because a particular dataset is highly structured into a 20 year old schema that does not exclude it from misuse. I believe structured data has been around for so long people are comfortable with (or have forgotten about) the associated risks.
Any data can be used for good or ill. Clearly, it does not make sense to take the position that “we” should not collect, store and leverage data based on the notion someone could do something bad.
I suggest the real conversation should revolve around access to data. Bill touches on this as well. Far too often, data, whether Big Data or “traditional”, is openly accessible to some people who truly have no need based on job function.
Consider this example – a contracted application developer in a government IT shop is working on the latest version of an existing application for agency case managers. To test the application and get it successfully through a rigorous quality assurance process the IT developer needs a representative dataset. And where does this data come from? It is usually copied from live systems, with personally identifiable information still intact. Not good.
Another example – Creating a 360 degree view of the citizens in a jurisdiction to be shared cross-agency can certainly be an advantageous situation for citizens and government alike. For instance, citizens can be better served, getting more of what they need, while agencies can better protect from fraud, waste and abuse. Practically any agency serving the public could leverage the data to better serve and protect. However, this is a recognized sticky situation. How much data does a case worker from the Department of Human Services need versus that of a law enforcement officer or an emergency services worker need? The way this has been addressed for years is to create silos of data, carrying with it, its own host of challenges. However, as technology evolves, so too should process and approach.
Stepping back and looking at the problem from a different perspective, both examples above, different as they are, can be addressed by incorporating a layer of data security directly into the architecture of the enterprise. Rather than rely on a hodgepodge of data security mechanisms built into point applications and silo’d systems, create a layer through which all data, Big or otherwise, is accessed.
Through such a layer, data can be persistently and/or dynamically masked based on the needs and role of the user. In the first example of the developer, this person would not want access to a live system to do their work. However, the ability to replicate the working environment of the live system is crucial. So, in this case, live data could be masked or altered in a permanent fashion as it is moved from production to development. Personally identifiable information could be scrambled or replaced with XXXXs. Now developers can do their work and the enterprise can rest assured that no harm can come from anyone seeing this data.
Further, through this data security layer, data can be dynamically masked based on a user’s role, leaving the original data unaltered for those who do require it. There are plenty of examples of how this looks in practice, think credit card numbers being displayed as xxxx-xxxx-xxxx-3153. However, this is usually implemented at the application layer and considered to be a “best practice” rather than governed from a consistent layer in the enterprise.
The time to re-think the enterprise approach to data security is here. Properly implemented and deployed, many of the arguments against collecting, integrating and analyzing data from anywhere are addressed. No doubt, having an active discussion on the merits and risks of data is prudent and useful. Yet, perhaps it should not be a conversation to save or not save data, it should be a conversation about access
“What really matters about big data is what it does. Aside from how we define big data as a technological phenomenon, the wide variety of potential uses for big data analytics raises crucial questions about whether our legal, ethical, and social norms are sufficient to protect privacy and other values in a big data world.”
These crucial questions, raised in a recent White House report on the implications of big data, frame a growing debate taking place across both society and the business world on how far organizations can push the limits with data collection and analysis. The report, issued by a presidential commission tasked with assessing big data’s privacy implications, explains how big data is a double-edged sword. While big data analytics pave the way to unexpected discoveries, innovations, and advancements in our quality of life, it also has the potential for abuse as well. As the report puts it, big data’s capabilities, “most of which are not visible or available to the average consumer, also create an asymmetry of power between those who hold the data and those who intentionally or inadvertently supply it.”
The report’s authors acknowledge that big data analytics is an engine of economic growth and a competitive tool for companies across all industries, as well as a tool for quality of life. “Used well, big data analysis can boost economic productivity, drive improved consumer and government services, thwart terrorists, and save lives,” the report states. In addition, there will likely be a profound impact as data analytics gets applied to the Internet of Things, which “have made it possible to merge the industrial and information economies.” In another example, healthcare providers and payers can employ predictive analytics to detect fraud and abuse in real time.
The report’s main thrust is personal privacy implications, and many these issues will inevitably shape the practices and policies of enterprises as they expand their businesses into the big data realm. The managers and professionals charged with identifying, collecting and analyzing information assets will increasingly be under pressure – as their organizations feel pressure – to understand the boundaries between insight, targeted engagement, and overreach.
For example, a still relatively unexplored area of big data is its ownership. Does data belong to those who collect it, or those who contribute to it? “Big data may be viewed as property, as a public resource, or as an expression of individual identity,” the report states.
Another challenge is the fact that many organizations will opt to assemble massive databases as they move forward with big data analysis. “Big data technologies can derive value from large data sets in ways that were previously impossible — indeed, big data can generate insights that researchers didn’t even think to seek.” For example, new tools and technologies provide for analysis across entire data sets, versus extracting a small representative subset of the data and extrapolating any results against a larger universe. However, with so much data, analysis may potentially be erroneous as well. “Correlation still doesn’t equal causation,” the report’s authors state. “Finding a correlation with big data techniques may not be an appropriate basis for predicting out-comes or behavior, or rendering judgments on individuals. In big data, as with all data, interpretation is always important.”
Another issue is the permanence of data – which also is a privacy issue. At the same time, this may also create headaches for corporate data managers as well. “In the past, retaining physical control over one’s personal information was often sufficient to ensure privacy,” the report states. “Documents could be destroyed, conversations forgotten, and records expunged. But in the digital world, information can be captured, copied, shared, and transferred at high fidelity and retained indefinitely. Volumes of data that were once unthinkably expensive to preserve are now easy and affordable to store on a chip the size of a grain of rice. As a consequence, data, once created, is in many cases effectively permanent. Furthermore, digital data often concerns multiple people, making personal control impractical.”
The report’s authors state that organizations need to take steps to address privacy issues, and suggest de-identification and encryption as technical solutions that are available at this time. However, in the long run, de-identification is still a weak approach to the problem. “Many technologists are of the view that de-identification of data as a means of protecting individual privacy is, at best, a limited proposition. In practice, data collected and de-identified is protected in this form by companies’ commitments to not re-identify the data and by security measures put in place to ensure those protections.”
Ultimately, the best methods to ensure the ethical use of data need to come through inspired and forward-thinking management. It takes judicious management, a commitment to training and education, and a focus on what nuggets of information matter the most to the business. Big data opens up many new vistas for enterprises, and those that take the high road will reap its rewards.