The Sanger Institute was a modest and little known cluster of buildings in Cambridgeshire when genomics expert Tim Hubbard arrived in 1997. Burgeoning spending on genomic research has led to a rapid expansion—the institute now employs 800 people, more than double its size five years ago. The latest addition: a 1,000-square-meter computer center ready to crunch the mass of genomic data scientists expect to generate in the next few years. With a budget of $8.5 million, Hubbard, the center's head of bio-infomatics, is leading an international team that will probe deeper than ever before into the mysteries of the human genome.
His team is making the first systematic effort to catalog all the working components of the genome and establish their roles. The ultimate goal of the project, called ENCODE, for Encyclopaedia of DNA Elements, is to find new clues to the genetic basis of a host of common diseases, such as diabetes, Alzheimer's and perhaps even cancer, which will in turn suggest new drugs and therapies. "We expect to discover many more genome features that will help us to understand human biology and the role of genome elements in health and disease," says Hubbard.
It's an ambitious undertaking. Hubbard's team will extend the work of a pilot project—which last summer completed an investigation into 1 percent of the genome, or 30,000 DNA base pairs—to the full 3 billion pairs of the genome. Crunching that much data would be hard enough, but molecular biology has a way of throwing up new puzzles. The pilot project revealed a picture of unsuspected complexity, upsetting many aspects of the standard view of the body's innermost workings. Vast stretches of DNA, for example, once dismissed as "junk," now appear to play a critical role in our makeup.
Fortunately scientists have technology on their side. Geneticists want to cross-correlate deviations in the genomes from person to person with the occurrence of disease, which requires a large sample size. The latest addition to the Sanger armory is a set of 30 rapid DNA sequencers, which will provide genomic data for analysis. The $6 million cost of the machines constitutes a vast reduction in the per-genome cost of sequencing. "Basically we are using a platform that is 100 times cheaper and 100 times faster than anything before," says Hubbard. "It changes the game entirely."
The research his lab is undertaking represents an entirely new approach to medical research. "At the moment you basically just throw a load of chemicals at the human body and see which ones stick. We are trying to build up our understanding right from the bottom," he says. "This is a very optimistic time."