In the Magazine Tech & Science

Storing Digital Data for Eternity

06_26_DataStorage_01
Blue LEDs on a row of servers are pictured at Google's Data Center. Innovative storage solutions, from DNA to silica glass recordings, could ensure we never enter a digital dark age. Connie Zhou/Google/ZUMA

Vint Cerf is sometimes called the “father of the Internet.” He helped develop TCP/IP (the communications protocol for the Internet) and later became chairman of ICANN (the Internet Corporation for Assigned Names and Numbers, i.e., the people who make domain names and IP addresses). But today he worries we’re heading into a digital dark age. “People think by digitizing photographs, maps, we have preserved them forever,” he says, “but we’ve only preserved them forever if we can continue to read the bits that encode them.”

Save a file—on a thumb drive, say—and several years later, your computer (and your friends’ computers) might not even know how to read it. The company that makes those USB drives—or the software that read them—may have long gone out of business, the engineers elsewhere or long-passed. It’s happened to the best of us, and the best of the U.S.: In 1975, NASA launched Viking 1 and Viking 2, two deep space probes to Mars. The agency’s Jet Propulsion Laboratory recorded information from the mission on a magnetic tape in a format that was state-of-the-art. But just 10 years later, no one at NASA had the skills or software to “read” it, and up to 20 percent of the Viking mission data was lost forever.

The moral is to be skeptical of the promises of technology. Services like Google Drive and Dropbox store all your data “in the cloud.” That sounds pretty numinous, but all it means is that your doc is saved on one of Google’s many servers. Now, if you end up stymied by USB obsolescence—or even if you spill coffee on your computer keyboard and fry it—as long as you can get onto Google Drive, you can access your documents. It feels awfully secure, but there’s no guarantee that it’s forever. Google could go out of business or sell off servers to someone who decides to wipe them clean. If the company were to shut down Google Drive, it would likely give customers ample time to move their data elsewhere. But what happens if you aren’t around to follow the instructions? Morbid as it is, imagine passing away, with your photos and files on the drive, password-protected and probably forgotten. Who’ll tend your inbox after you die?

Compounding the problem: Digital equipment, compared with clay or paper, isn’t very durable. Hard drives, flash drives, floppies and CD-ROMs all lack serious longevity. Servers, for instance, have to be replaced about every five years. Leave a server farm alone too long and its stored data will degrade and become inaccessible at a pace much faster than that of its analog predecessors.

That’s why several projects are underway to build a form of storage for digital data that doesn’t degrade. Peter Kazansky and his partners at the University of Southampton, for example, are working on molding silica glass into what is, for all intents and purposes, an infinite storage device. The glass, modified quartz, is one of “the most stable materials on earth,” says Kazansky. In normal conditions, it can store data for billions of years.

 

06_26_DataStorage_02 Using alterations in the way quartz refracts light, we could store data in a superdense form for centuries, according to researchers. University of Southampton The silica glass is costly, says Kazansky: A bare 5-inch silica glass disc is about $500. The ultrafast lasers used to record the data on the disks also come with a hefty price tag: $100,000. Kazansky hopes his storage glass will eventually be produced at a commercial scale; he says the price “could be reduced 10 to 100 times in mass production.”

Kazansky hopes his invention will eventually be used by “national archives, museums, libraries” and private organizations with a lot of data. “Companies have to back up their archives every five to 10 years because hard-drive memory has a relatively short life span,” he says. Contrast that with the copy of the Bible that Kazansky and his team have recently recorded in glass: Kazansky predicts the recording will “survive the human race.”

Meanwhile, the Japanese engineering conglomerate Hitachi has also begun to develop its own method of recording digital data on glass; company representatives say their product can store data for 100 million years.  

But both Southampton and Hitachi are stymied by a problem that, in classical digital storage, was solved a long time ago: space. Hitachi and Southampton’s storage mechanisms both top out at 40MB per square inch. That’s better than a CD (which can store a max of 35MB in the same space) but not nearly as good as a standard hard disk, which can hold at max a terabyte per square inch.

One promising proposal comes from inside your body. At great magnification, your DNA—or that of any other living organism—looks like a lovely double helix composed of four organic molecules. You might remember them from high school biology: adenine, guanine, thymine, cytosine. What’s interesting about A, G, T and C is that they can be rearranged in patterns to represent language—English or Mandarin, Python or Swift—much like the way we used dashes and dots—in Morse code—to send sentences across countries.

Because DNA is such a tightly packed array of code, it can outcompete all conventional storage: It can hold a mind-blowing 700 terabytes per gram. Bio-artist Joe Davis, for example, recently used synthetic biology to stick a DNA-encoded version of the entirety of Wikipedia inside an apple. George Church, the chemist who invented DNA encoding, has stored 70 billion copies of his book, Regenesis, in a drop of synthetic DNA smaller than the period at the end of this sentence. Under ideal conditions, says Church, those books will last 700,000 years: To give a sense of that time scale, the first printed book, the Gutenberg Bible, was produced just 560 years ago.

Right now, the process is too slow to be practical. With current-day sequencing technology, one can read, at most, 12.5GB per day from DNA; that’s about 16 hours of film, which sounds like a lot until you consider how fast your current-day computer can download a movie (hint: It doesn’t take an hour to process 90 minutes of screen time). In addition, both writing and reading DNA-encoded data require complex machinery that only a few specialized labs can access, and it’s just as subject to human and natural volatility as NASA’s magnetic tapes.

Long Now, a nonprofit organization for data preservation, may have a solution that could help our information survive a digital (or other) apocalypse—and maybe even help our survivors rebuild. The Rosetta is a 3-inch disk of nickel laser-etched with 13,000 page’s worth of linguistic information. Much of it is made up of parallel texts—the same words in lots of languages, sort of like the project’s archaeological namesake. For example, the Rosetta includes the first three chapters of Genesis, the first book of the Old Testament, written in 15,000 different languages. “We’re not a religious organization,” says Laura Welcher, the Rosetta’s curator. But they needed to find texts that were written in as many languages as possible—even those least commonly found around the world—in order to create the most comprehensive translation keystone possible for future generations. “It turns out that there are missionaries around the world who are working on Bible translations,” says Welcher. The rest of the Rosetta library includes “the 3,500 books most essential to sustain or rebuild civilization.”

Currently, each page of the project is 400 microns across—about the width of five human hairs. That sounds small, but compared with DNA, it’s gigantic. It can be read with a standard optical microscope that uses the same magnifying techniques we’ve been using for hundreds of years. “We could have put the information on the disc at a much higher density—made the pages much smaller so you’d have to read them with an electron microscope, says Welcher, “but it takes a long time for a society to get to the point where they can magnify to that extent.” In other words, even come apocalypse, Rosetta will be readable.

Long Now is also looking to help documents withstand the more mundane threat of “I can’t read this floppy!” type of digital “darkness.” The group is developing the Long Server, an ever-growing database of file-conversion resources. Got a bunch of old .pcx files you’d love to convert to .jpgs? Long Server’s Format Exchange can help you out.

Cerf, who started this conversation, wants to do Format Exchange one better: He’s called for the creation of “digital vellum,” a technique for packing and storing digital files along with all the code that’s needed to decrypt them. Example: If you store a document made on Microsoft Word on an Apple Computer running OS X 10.8.5 as a piece of digital vellum and open it in 100 years, whatever machine you have that can decode computer data will have all it needs to take you back in time. It would be able to reconstruct that same Apple computer, build and run OS X 10.8.5 and whatever version of Word you installed on it, and open the document exactly as it was.

If you’ve ever used a program like Boot Camp to emulate Windows on a Mac computer, that’s more or less what digital vellum would be like, except instead of emulating a current OS, you’d be emulating a system from a previous century—from the chip structure on up.

One thing’s for sure, though: If we want to get this done, we’d better start soon. Digitization has created an environment where we can now produce an enormous amount of data—90 percent of all data ever generated by human beings has been created in the past two years, according to IBM. Safeguarding even a fraction of that information could give us the richest historical record the human race has ever known. Failing to preserve that information would mean that the records of one of the most innovative eras in history could be lost.