In early 2014, BuzzFeed's poisonously shareable quizzes turned out to be an unprecedented boon for traffic. According to blogger and analytics expert Dan Barker, they've also become an unprecedented opportunity for data mining. "Shareable," in this case, takes on a new meaning—it's not about sharing BuzzFeed's Facebook-maximized content on social media. It's about sharing a range of personal information with the site in the context of quizzes like "Which Backstreet Boy Should You Actually Marry?" and "Which Disney Princess Are You?"
In a new blog post, Barker explains what that information looks like. "When you visit BuzzFeed, they record lots of information about you," he writes. "Most websites record some information. BuzzFeed records a whole ton." In any given visit, Barker writes, BuzzFeed records answers to the following (in addition to what's passed along to Google for analytics purposes):
- Have you connected Facebook with BuzzFeed?
- Do you have email updates enabled?
- Do they know your gender & age?
- How many times have you shared their content directly to Facebook & Twitter & via email?
- Are you logged in?
- Which country are you in?
- Are you a BuzzFeed editor?
- …and about 25 other pieces of information.
If a user is logged in, that pool of information also includes his or her username. When quizzes are involved, things get trickier. Most of the quiz topics—boy bands, Disney princesses and the like—are pretty frivolous. Some of them aren't.
Consider Barker's example, the mega-viral "How Privileged Are You?" quiz. It has received well over 2 million views, which means roughly 2 million people have told BuzzFeed whether they have been raped, attempted suicide or taken medication for mental health reasons. For every answer, BuzzFeed records a unique ID.
"If I had access to the BuzzFeed Google Analytics data, I could query data for people who got to the end of the quiz and indicated—by not checking that particular answer—that they have had an eating disorder," Barker writes. "Or I could run a query along the following lines if I wished: Show me all the data for anyone who answered the 'Check Your Privilege' quiz but did not check 'I have never taken medication for my mental health.'"
BuzzFeed's senior communications manager, Christina DiRusso, says not to worry—the data are recorded only in bulk.
"We anonymize all usage data and have strict internal policies around only accessing data in the aggregate form," DiRusso wrote in an email to Newsweek. She added that storing personally identifiable information, or PII, would violate Google Analytics' terms of service. "We are only interested in data in the aggregate form. Who a specific user is and what he or she is doing on the site is actually a useless piece of information for us. We know how many people got Paris or prefer espresso in the 'Which City Would You Live In?' quiz, but we don't know who they are or any of their PII."
According to DiRusso, 99 percent of BuzzFeed users aren't logged in. But the other 1 percent (which is subsequently anonymized) is still a huge number, given BuzzFeed's traffic figures. "From a technical point of view, it would be really easy to link pseudonyms to real users, and is a fairly common practice," Barker told The Independent. But BuzzFeed probably doesn't, he acknowledges:
Barker continues: "But BuzzFeed say specifically they do not and, as a fairly transparent company, I would be inclined to take their word for it. It's also worth mentioning that this is a total minefield and lots of website owners don't fully understand what data they're recording.
That last point is important: Most of BuzzFeed's staff probably isn't aware of how much data the company has access to. But the information is there whether they use it or not, and whether it's anonymized or not. The concern, then, is less about how BuzzFeed is making use of quiz responses today than about how a BuzzFeed imitator (of which there are dozens—the latest, PlayBuzz, has somehow racked up 53 million uniques in a few months) might do so a year from now.
If nothing else, it's a new surveillance model for the National Security Agency. It's simple: They can just quit data-mining phone calls and emails and instead build up a viral media vertical, bombard Americans with irresistible quizzes and take it from there.