Genomics: In search of rare human variants
* Rasmus Nielsen
Journal name: Nature
Volume: 467 ,
Date published: (28 October 2010)
Published online 27 October 2010
The 1000 Genomes Project has completed its pilot phase, sequencing the whole genomes of 179 individuals and characterizing all the protein-coding sequences of many others. Welcome to the third phase of human genomics. See Article p.1061
The goal of the 1000 Genomes Project1 is to find most of the variants in the human genome that have a frequency of at least 1% in the populations studied. The consortium of researchers participating in the project now reports the results of its pilot phase (page 1061 of this issue2).
But first let's take a step back. A decade ago, the reference copy of the human genome was sequenced3, 4. Although that project is undoubtedly one of the greatest scientific achievements of our time, its potential societal impact will be fully realized only if genomic regions that are responsible for various traits of medical importance, such as response to a drug or susceptibility to a disease, can be identified. After the initial sequencing of the human genome, therefore, a second phase of human genomics emerged, focusing on identifying genomic variations responsible for hereditary diseases and other medically relevant traits. Such genome-wide association studies (GWAS) are based on examining the genomes of thousands of individuals for correlations between the presence of genomic variants and the trait of interest.
Many successes have come out of GWAS5, 6, but there has also been some disappointment that perhaps the pickings from these studies have been too slim7. For instance, although certain disorders — including obesity, diabetes and cardiovascular disease — are known to have a strong genetic component, their associated genomic variants detected through GWAS cannot explain most of the experimentally identified genetic effects found in affected families. Human geneticists call this problem the 'missing heritability'7.
There are many possible explanations for the missing heritability, the most popular being the effect of rare variants. GWAS are based on examining a battery of different variants across the genome. Until recently, however, the cost of including both common and rare variants in such studies was prohibitively high, pushing the focus towards identifying common variants that occur at a relatively high frequency in the population. Consequently, if many rare variants, rather than a few common ones, are responsible for a disease, the rare variants would have been missed in most GWAS.
An obvious solution to this problem is to sequence whole genomes. But this is easier said than done: GWAS require sample sizes of thousands, making whole-genome sequencing extremely expensive. However, computational-biology studies have provided crucial insight that is helping to pave the way for more-comprehensive genomic studies. The idea is that if most of both common and rare variants can be characterized in just a few individuals through whole-genome sequencing, a relatively small battery of variants could then be identified in the remaining individuals in the genome-wide association study, and the pattern of those variants could be inferred computationally on the basis of the few whole-genome sequences.
Sceptics may find this notion — using the data from some individuals to 'invent' data for others — alarming. But if done correctly, this method, called imputation8, can significantly increase the statistical power of GWAS (Fig. 1). This idea is one of the main motivating forces behind the 1000 Genomes Project.
Genomics: In search of rare human variants : Nature : Nature Publishing Group