http://www.plosgenetics.org/home.action;jsessionid=234DE5F982FB7084D356E6B08CC79D1C.ambra02
The Characterization of Twenty Sequenced Human Genomes
We report here the nearly complete genomic sequence of 20 different individuals, determined using “next-generation” sequencing technologies. We use these data to characterize the type of genetic variation carried by humans in a sample of this size, which is to our knowledge the largest set of unrelated genomic sequences that have been reported. We summarize different categories of variation in each genome, and in total across all 20 of the genomes, finding a surprising number of variants predicted to reduce or remove the proteins encoded by many different genes. This work provides important fundamental information about the scope of human genetic variation, and suggests ways to further explore the relationship between these genetic variants and human disease.
References
Kimberly Pelak1#, Kevin V. Shianna1#, Dongliang Ge1#, Jessica M. Maia1, Mingfu Zhu1, Jason P. Smith1, Elizabeth T. Cirulli1, Jacques Fellay1, Samuel P. Dickson1, Curtis E. Gumbs1, Erin L. Heinzen1, Anna C. Need1, Elizabeth K. Ruzzo1, Abanish Singh1, C. Ryan Campbell1, Linda K. Hong1, Katharina A. Lornsen1, Alexander M. McKenzie1, Nara L. M. Sobreira2, Julie E. Hoover-Fong2, Joshua D. Milner3, Ruth Ottman4,5, Barton F. Haynes6, James J. Goedert7, David B. Goldstein1*
1 Center for Human Genome Variation, Duke University School of Medicine, Durham, North Carolina, United States of America, 2 McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland, United States of America, 3 Allergic Inflammation Unit, Laboratory of Allergic Diseases, National Institute of Allergy and Infectious Diseases, Bethesda, Maryland, United States of America, 4 G. H. Sergievsky Center and Departments of Epidemiology and Neurology, Columbia University, New York, New York, United States of America, 5 Division of Epidemiology, New York State Psychiatric Institute, New York, New York, United States of America, 6 Duke Human Vaccine Institute, Duke University, Durham, North Carolina, United States of America, 7 Infections and Immunoepidemiology Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, Maryland, United States of America
Abstract
We present the analysis of twenty human genomes to evaluate the prospects for identifying rare functional variants that contribute to a phenotype of interest. We sequenced at high coverage ten “case” genomes from individuals with severe hemophilia A and ten “control” genomes. We summarize the number of genetic variants emerging from a study of this magnitude, and provide a proof of concept for the identification of rare and highly-penetrant functional variants by confirming that the cause of hemophilia A is easily recognizable in this data set. We also show that the number of novel single nucleotide variants (SNVs) discovered per genome seems to stabilize at about 144,000 new variants per genome, after the first 15 individuals have been sequenced. Finally, we find that, on average, each genome carries 165 homozygous protein-truncating or stop loss variants in genes representing a diverse set of pathways.
Author Summary
We report here the nearly complete genomic sequence of 20 different individuals, determined using “next-generation” sequencing technologies. We use these data to characterize the type of genetic variation carried by humans in a sample of this size, which is to our knowledge the largest set of unrelated genomic sequences that have been reported. We summarize different categories of variation in each genome, and in total across all 20 of the genomes, finding a surprising number of variants predicted to reduce or remove the proteins encoded by many different genes. This work provides important fundamental information about the scope of human genetic variation, and suggests ways to further explore the relationship between these genetic variants and human disease.
Citation: Pelak K, Shianna KV, Ge D, Maia JM, Zhu M, et al. (2010) The Characterization of Twenty Sequenced Human Genomes. PLoS Genet 6(9): e1001111. doi:10.1371/journal.pgen.1001111
Editor: Greg Gibson, Georgia Institute of Technology, United States of America
Received: January 27, 2010; Accepted: August 3, 2010; Published: September 9, 2010
This is an open-access article distributed under the terms of the Creative Commons Public Domain declaration which stipulates that, once placed in the public domain, this work may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose.
Funding: Funding was provided by the Bill and Melinda Gates Foundation grant 157412, with additional funding by the National Institute of Allergy and Infectious Diseases (NIAID) Center for HIV/AIDS Vaccine Immunology (CHAVI) grant AI067854. Funding for the collection of control samples was provided in part by RC2MH089915 from the National Institute of Mental Health, and Award Number RC2NS070344 from the National Institute Of Neurological Disorders And Stroke. This research was supported in part by funding from the Division of Intramural Research, NIAID, National Institutes of Health. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
* E-mail: d.goldstein@duke.edu
full-text:
PLoS Genetics: The Characterization of Twenty Sequenced Human Genomes
No hay comentarios:
Publicar un comentario