Raw Data Is Both An Oxymoron and a Bad Idea: Data Should Be Cooked With Care

“Raw data is both an oxymoron and a bad idea. On the contrary, data should be cooked with care.” This was a statement made by Geoff Bowker in 2005, and served as the opening lines of a recent talk by Kate Crawford, principal researcher at Microsoft Research New England and a visiting MIT professor, who urged that big data be adopted and handled cautiously.

In her keynote at the recent DataEDGE 2013 conference, held at the University of California at Berkeley, Crawford said the time is now to have a discussion on the implications big data is having on business and society.

She outlined the six myths that have arisen around big data:

Myth #1: Big data is new. References to big data began to pop up in the literature in the late 1990s, but this is something some prominent industries, such as financial services firms and oil companies, have been wrestling with for decades, Crawford says.  What is new, however, “is the fact that a lot of the tools of big data are becoming more easily reached by a lot more people. We’re having an explosion in ideas, creativity and imagination in terms of what we can do with these technologies.” This is the time to discuss the implications of big data, she adds, because much of it will be invisible within a few years as the tools and technologies mature.  “Really usable systems and really good technologies disappear,” she states. “The easier they are to use, the harder they are to see.”

Myth #2: Big data is objective.  Actually, big data sets can be very biased, Crawford states. For example, she says, she poured through 20 million tweets sent out about Hurricane Sandy, which flooded her neighborhood in Manhattan last year. While the tweets tell a compelling story about how residents coped, they mainly represent the views of younger, more well-to Manhattan residents.  “If we look a little closer at the tweets, most were coming out of Manhattan, which has a higher concentration of people using smartphone, and a higher concentration of Twitter users – a subset of a subset. There were very few tweets coming from the far more affected areas, such as Breezy Point or the Far Rockaways. Because we don’t have the data from those places, we essentially have very privileged urban stories. We have to be really clear who were talking about, we have to think about what this data really represents,” she says.

Myth #3: Big data doesn’t discriminate.  “There’s a myth that says essentially because you’re dealing with large data sets, you can somehow avoid group-level prejudice,” Crawford cautions. She pointed to a recent study of the Facebook “likes” of 60,000 people that found such data can be used to identify a person’s race, sexual orientation, religious views, political leanings, and even if they are a previous drug or alcohol user.  “The researchers also expressed a set of concerns that this data can be bought by anyone. Ultimately, employees can make decisions about individuals based on this data.”

Myth #4: Big data makes cities smarter. While big data goes a long way to improve the management of city problems, it also may under-represent communities. “Not all data is created or collected equally –  there are always certain communities of people who are going to be left out of those data sets,” Crawford says. For example, last year, the city of Boston released an app called StreetBump, which automatically registered potholes by passively collecting GPS data from drivers’ smartphones. The program collected a great deal of data on potholes. However, she adds, “wealthier younger citizens are more likely to have smartphones, and therefore, wealthy areas with younger people would get more attention, while areas with older residents with less money will get fewer resources.”

Myth #5: Big data is anonymous. Crawford cited a recent study, published in Nature, which determined that individuals could be identified with no more than four data points, including their cell phone number. Before the advent of personal technology, it took about 12 data points to identify an individual. “It’s very difficult to make data anonymous – even with two randomized data points, it’s possible to identify 50 percent of people.”  Another big data initiative, the smart grid being adopted by electric utilities, will capture a wealth of data – from energy usage to “when you have friends over, when you are sleeping. This is some very intimate data.”

Myth #6: You can opt out of big data. There are suggestions that people will be able to protect their privacy is they pay a fee for web services to opt out of tracking, versus using services for free in exchange for giving up some information. Crawford cautions that this will result in a two-tier system, which “turns private data into  a luxury good rather than a public good.”

Rather than making data privacy and management an individual choice, Crawford urges a more public discussion on “the way that the data is essentially flowing between corporations, individuals and governments.”

This entry was posted in Big Data and tagged . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>