Keeping track of what we reveal about ourselves each day—through email and text messages, Amazon purchases and Facebook "likes"—is hard enough.
Imagine a future when Big Data has access not only to your shopping habits, but also to your DNA and other deeply personal data collected about our bodies and behavior—and about the inner workings of our proteins and cells. What will the government and others do with that data? And will we be unaware of how it's being used—or abused—until a future Edward Snowden emerges to tell us?
Consider this scenario: A few years from now the National Security Agency hires a young analyst trained in cyber-genetics. She is assigned to comb through millions of DNA profiles in search of markers that might identify terrorists and spies and other persons of interest. It's simple enough, since almost every American and billions of other people have deposited their complete genomes—every A, C, T and G in their cells—into one of the huge new digital health networks, the new Googles and Verizons of medical data.
Sequencing a person's entire DNA profile will be as cheap as getting a car wash. High-end automobiles and hotels are likely to have installed photonic (light) sensors—devices that quickly read small segments of DNA in a customer's skin cells to confirm their identity—to unlock doors. Banks may offer DNA-secure accounts that can only be accessed by a person with the correct genetic code.
People in this future world will be accustomed to genetics guiding treatments and saving lives, even as they remain uneasy about who exactly has access—Employers? Insurers? The government? Their spouse or lover?
With her top-secret clearance, the NSA's new analyst discovers that the agency has accessed the genetic records of not only suspected terrorists, but also heads of state and leaders in industry, academia, the arts and the news media. Troubled by what she has learned, the analyst announces that she's taking a vacation, and flies to a neutral country carrying top-secret cyber-genetic documents stored on an encrypted nanochip. Like Edward Snowden, she gives her data to a reporter, with the hope of rectifying the injustices she has witnessed.
For better or worse, we're not there yet. In 2014, neither the government nor the public sector are anywhere near having a World Wide Web for genetic and other personal molecular data, or a global wireless network that can access anyone's genetic data from anywhere. If this were the Internet, the technology would be in about 1985—at the very beginning.
Physicians, however, are already using genomics to predict and diagnose diseases such as breast cancer and macular degeneration. Thousands of parents use prenatal genetic tests to check if their embryo or fetus carries genes for devastating diseases such as Tay-Sachs or Fragile X syndrome. Researchers have discovered genetic markers that can identify mutations in cancerous tumors that allow doctors to target specific chemotherapy drugs to match a patient's mutations in their own DNA—leading, in some cases, to astonishingly high rates of remission.
In the past two decades, the drug industry and government agencies like the National Institutes of Health have plowed hundreds of billions of dollars into turning genetics from a research project into something real. AT&T, Verizon, IBM and other IT giants are developing digital health networks and products, while thousands of start-ups are in a mini-frenzy to create new digital health networks and apps.
Some companies, including Google-backed 23andme, have begun to provide customers with access to their own genetic data. (23andme actually stopped providing customers with genetic health data after being warned by the FDA that they need approval for some of these tests—the company says that they are working to fix this). Labs and companies are also in the very early stages of developing devices that read short DNA sequences using light waves, or a simple pinprick of blood.
In January, San Diego-based Illumina, a gene-sequencing company, announced that it can now sequence an entire genome for only $1,000. This may sound pricey, but just a decade ago a single human genome cost hundreds of millions of dollars to sequence. The price is likely to get even less expensive in future years.
This year, the number of people having their genomes sequenced could top 50,000, and that number should increase exponentially over the next few years as governments and health-care systems announce projects to sequence hundreds of thousands of people. Last year the U.K. announced plans to sequence 100,000 citizens by 2017. In the U.S., Kaiser Permanente has teamed up with the University of California at San Francisco to sequence 100,000 patients.
Eventually the mountains of data generated by our DNA and digital health records will be linked to Facebook and Twitter pages (or the future equivalent), and to those pink suede shoes you just bought and shared on the latest incarnation of Instagram. We may not like it, but the reality is that we give up this type of information to these companies every day. And if people want to keep getting the services they provide, they're going to keep trading data for it.
The result in a few years will be staggeringly complex statistical models designed to predict your behavior and to identify personality types, including those prone to violence or terrorism. Congress has passed a law barring health insurers and employers from using DNA to discriminate. Beyond this, however, we have few protections.
Genetic predictions will not be perfect or deterministic. It turns out that DNA is only part of the equation that makes you who you are or will be. Using genetic profiling for identifying terrorists or other personality types will also be imprecise and fraught with errors. Yet the more data amassed about individuals over time, the more accurate the modeling that creates the predictions.
For instance, scientists in a 2008 study associated a variant of the MAOA gene—the so-called "warrior gene"—to a predilection for violent behavior in some people. The statistical strength of this correlation is weak, and even if you have that genetic marker, you may in fact be a full-on pacifist. But let's say that one afternoon you as a carrier of this gene variant "liked" an essay by a former Palestinian commando-turned-diplomat. An hour later you got curious about Al-Qaeda and did a quick Google search. What if some search algorithm at the NSA then connected your social media data to your DNA? The next thing you know, the Transportation Security Administration is stopping you from boarding your flight home for the holidays.
This is just one hypothetical example. As we rush into an era of bigger and better data being crunched by legions of government and public sector employees, we may have to get used to our health information being hacked and interpreted incorrectly or in ways that might work against us. Of course, it would be better to have an open debate and transparent policies about this type of data now.
Failing that, we may wake up one morning to read that the NSA once again has been spying on us—only this time, it won't be about who we called or texted, but the secrets buried deep inside our cells that tell us a great deal about who we are and who we might be in the future.
Editor's note: An earlier version of this article contained a quote mistakenly attributed to Eric Topol. It has been removed.