Facebook won an important skirmish in the privacy wars this week. That is one way, at least, to read the news that the social-networking startup successfully forced Pete Warden, a tech entrepreneur and former Apple engineer, to delete a data set culled from Facebook's public profile pages. Warden had spent several months and tens of thousands of dollars crawling the public pages and collecting data on individuals—their location, fan pages, and a sampling of their friendships. Warden planned to release the data to a select group of academics, who wanted to use it to study how social networks affect disease transmission or unemployment. When Facebook threatened to sue, Warden agreed to delete the whole thing. And thus Facebook beat back one of the hacker-barbarians at the gate.
Of course, there is another way to read this: that this is more about PR than privacy. Facebook was once the dorm-room project of Mark Zuckerberg, who hacked into university IT systems to expand his popular project; now it's a major corporation, valued by some at $14 billion and eyeing an IPO. Warden and his data set threatened to embarrass the company, which is no stranger to privacy-related PR nightmares. (See: Beacon.) Facebook's lawyers acted accordingly. David has grown into Goliath and is sending cease-and-desist letter to anyone with a slingshot.
The second storyline seems more plausible. Of course, it's true that there are massive privacy implications whenever someone collects a dataset including lots of personal information. During an interview in New York City in February, Warden admitted that one of the first things he did after compiling the data was delete the information from Iran. Why? He didn't want to give President Mahmoud Ahmadinejad's security snoops a ready-made tool for discovering which Iranians had, say, joined the fan page of opposition candidate Mir Hossein Mousavi. The snoops could have found the same information on their own quite easily—all it takes is a Google search—but there is power in aggregation, and Warden didn't want to unintentionally aid a cruel and indiscriminate regime. Moreover, before releasing the rest of the data set to a carefully vetted group of academics, Warden planned to delete identifying information like names—to "anonymize" it, as programmers say. But a well-publicized 2008 research paper by computer scientists at the University of Texas at Austin showed that it's next to impossible to scrub data totally clean. They successfully "de-anonymized" parts of a huge data set of customers' movie ratings released by Netflix.
Scary stuff, right? But as Warden points out, "the data I was planning to release is already
in the hands of lots of commercial marketing firms." Companies like
Rapleaf and InfoUSA hoard our information, collecting everything from
individual e-mail addresses to Facebook fan-page membership to household
income. A huge network (including banks, credit-card companies, and, yes, magazines) constantly trades this data. And Warden's censure will do
nothing to stop that. Indeed, the public profile pages he crawled remain public. Marketers, spammers, and hackers are free to re-create Warden's
effort. And presumably they won't be brash enough to blog about it, so
Facebook may never know. Nor will they be philanthropic enough to give
away their information hoard to public-health researchers.
let's take a tally. Facebook has succeeded in stopping one responsible,
cautious entrepreneur from delivering public information to
public-interest-minded researchers. Meanwhile, countless companies are
using the same information for more mercenary reasons; some may even be
using it for illegal pursuits like phishing passwords or hacking bank
accounts. Sounds like Goliath is targeting slingshots when there are
IEDs all over the place.