lunes, 20 de enero de 2014

Second-Pandemic Strain of Vibrio cholerae from the Philadelphia Cholera Outbreak of 1849 — NEJM

Second-Pandemic Strain of Vibrio cholerae from the Philadelphia Cholera Outbreak of 1849 — NEJM

Second-Pandemic Strain of Vibrio cholerae from the Philadelphia Cholera Outbreak of 1849

Alison M. Devault, M.A., G. Brian Golding, Ph.D., Nicholas Waglechner, M.Sc., Jacob M. Enk, M.Sc., Melanie Kuch, M.Sc., Joseph H. Tien, Ph.D., Mang Shi, M.Phil., David N. Fisman, M.D., M.P.H., Anna N. Dhody, M.F.S., Stephen Forrest, M.Sc., Kirsten I. Bos, Ph.D., David J.D. Earn, Ph.D., Edward C. Holmes, Ph.D., and Hendrik N. Poinar, Ph.D.
January 8, 2014DOI: 10.1056/NEJMoa1308663
Cholera is a diarrheal disease caused by colonization of the intestines by cholera toxin–expressing strains of the waterborne enteric bacterium V. cholerae. An outbreak can arise suddenly, especially in vulnerable populations with compromised sanitation infrastructure, as in the devastating 2010 outbreak in Haiti.1 In 2012 alone, V. cholerae infected 3 million to 4 million people, killing nearly 100,000.2 Although all pathogenic V. cholerae strains possess a similar genomic backbone that may have facilitated adaptation to human intestinal mucosa,3,4 the predominant pathogenic strain, serogroup O1, harbors two genetically distinct biotypes: classical and El Tor (for descriptions of these and other terms, see the Glossary). In the 20th century, for unknown reasons, El Tor replaced classical as the dominant biotype. There have been seven documented pandemics since 1817,5 but the causal V. cholerae strains have been genetically characterized only for the two most recent outbreaks. Therefore, although most assume that the classical biotype was responsible for the earlier pandemics,6 the identities of the strains that caused them remain unknown.
Diverse tissue specimens archived by medical practitioners at the time of an outbreak represent an essentially untapped genetic museum for pathogen research. One extraordinary collection includes a preserved intestine from a patient who died from cholera in the 1849 Philadelphia outbreak (Figure 1FIGURE 1Historical Intestinal Specimen.), collected by Dr. John Neill. By coupling targeted enrichment with high-throughput sequencing,7 we used trace degraded DNA from this specimen to reconstruct a mid–19th century V. cholerae genome and to test the hypothesis that the O1 classical biotype was responsible for the second cholera pandemic.


We extracted DNA with the use of an organic extraction protocol and converted the extracted DNA into Illumina sequencing libraries8 in dedicated ancient-DNA facilities. We then enriched the libraries for the O1 classical genome (strain O395; National Center for Biotechnology Information reference sequences NC_009456 and NC_009457) as well as for regions not found in classical strains (e.g., vibrio seventh-pandemic islands I and II [VSP-I and VSP-II]), human mitochondrial genome and amelogenin X and Y genes, and other regions. We mapped the sequencing reads to strain O395 with the use of Burrows–Wheeler Aligner, version 0.5.9rc1 (release 1561),9 aligned the resulting consensus sequence to 31 full genomes (Table S2 in Supplementary Appendix 1, available with the full text of this article at using Mauve,10 called SNPs with a custom PERL script, and visualized features of the sequence in comparison with those of strain O395 with the use of Circos, version 0.36-411 (Figure 2FIGURE 2The PA1849 Genome.). We used Gblocks12 to prune insertions, deletions, and potential misalignments, which resulted in a final alignment of 28,591 SNPs that we used in maximum-likelihood13 and Bayesian14phylogenetic analysis.
Our full methods and results are available in Supplementary Appendix 1. Sequences can be found at the Sequence Read Archive ( under the BioProject accession number SRP029921. Ethics approval for the study was obtained from Hamilton Health Sciences and McMaster University.


Historical V. cholerae Genome

We reconstructed a draft V. cholerae genome (PA1849) at an average unique coverage depth of 15.0×, comprising 94.8% of the O395 reference strain with at least 1.0× coverage and differing by 203 SNPs (see Supplementary Appendix 2). If regions not present in PA1849 (see below) are excluded, then 97.4% of the reference sequence is present at an average coverage depth of 15.4×. Reference strain O395 regions not covered by our sequencing data could represent missing, rearranged, or highly divergent regions in the historical genome itself; could be the result of preservation or procedural biases (e.g., poorly preserved AT-rich regions, biased amplification, and uneven enrichment7,15); or both. The V. cholerae DNA fragments have typical ancient-DNA damage patterns (Fig. S4 in Supplementary Appendix 1).16 Overall, coverage correlates strongly with GC content across most regions (Figure 2, and Fig. S3 in Supplementary Appendix 1).

Genomic Islands and Virulence Factors

Strain PA1849 shares the following phylocore genome (PG) genomic islands (GIs) with strain O395: O1, vibrio pathogenicity islands 1 and 2 (VPI-1 and VPI-2), and GI-1 through GI-10 (Table S3 inSupplementary Appendix 1); in addition, PA1849 possesses the PG-2 islands GI-23 (a putative prophage found today only in classical strains O395 and RC27) and GI-24 (a putative prophage with CRISPR [clustered regularly interspaced short palindromic repeat]–associated proteins) (Figure 2).4Strain PA1849 does not have GI-11, GI-14, or GI-21; the absence of these GIs suggests that they were acquired after 1849 by the modern classical strains.
Strain PA1849 contains all known major virulence regions (e.g., VPI-1, VPI-2, and CTX prophage) common to classical V. cholerae but does not have nonclassical genomic regions or variants (e.g., VSP-I and VSP-II) (Tables S3 and S4 in Supplementary Appendix 1). The average GC content for these loci is not substantially lower than that in the successfully recovered genomic regions, suggesting that the absence of the loci is unlikely to be an artifact of preservation. VPI-1 has lower-than-average coverage (7.5×, vs. 15.0×), which is probably a result of its relatively low GC content (35%, vs. 46.7% for the entire genome) rather than its absence in the historical genome (Figure 2). Relative to the region in strain O395, VPI-1 in strain PA1849 contains one synonymous SNP (intcpA), and VPI-2 contains four SNPs.
Like strain O395, strain PA1849 contains the classical ctxB and rstR variants and the expected deletion in the large-chromosome RTX element. It is also likely to possess the same CTX positions as strain O395, because it appears to have identical chromosomal flanking regions, albeit observed at low coverage. However, its CTX prophage configuration, which varies between strains,4,17 has not been observed in classical strains to date; read assemblies indicate that there is a tandem CTX repeat span (Fig. S8 and S9 in Supplementary Appendix 1) on one or both chromosomes, with no read assemblies supporting the presence of the truncated CTX prophage repeat that is typical of modern classical strains (Fig. S10 in Supplementary Appendix 1).

Human Mitochondrial and Nuclear DNA

The complete mitochondrial DNA genome from the patient with cholera was retrieved with a coverage depth of 149.0×, and reads exhibit a typical ancient-DNA damage profile (Fig. S4C inSupplementary Appendix 1).16 The consensus sequence belongs to haplogroup L3d1b3, found today in sub-Saharan western Africa.18 Reads matching amelogenin gene X and Y alleles suggest that this patient was male, although coverage across these regions was poor.

Origin and Evolution of V. cholerae

Our phylogenetic analysis revealed a major division between the PG-1 and PG-2 lineages (Figure 3FIGURE 3Evolutionary Analysis of Vibrio cholerae.). Most El Tor strains cluster in the seventh-pandemic (7P) clade, which also includes strain MO10.4,19 Representatives of clades L6 (strain NCTC 8457), L3 (strain 2740-80), and L5 (strains M66-2 and MAK757), together with the 7P clade, make up the PG-1 clade, whereas the PG-2 clade comprises L7 (strain V52), L1 (classical strain), and strain PA1849. Strain PA1849 sits several SNPs away from the L1 clade node, with strong bootstrap support.
Our initial attempts to date the evolutionary history of V. cholerae were hindered by an inability to estimate evolutionary rates from tip dates alone, even when PG-1 and PG-2 were analyzed separately. This is probably the result of a combination of site saturation and the extensive recombination that is typical of V. cholerae.4,19,20 We confidently detected at least 37 recombination events between PG-1 and PG-2 (Figure 3), as well as a number of intragroup recombination events and events that brought genetic diversity into PG-1 and PG-2 from unknown parental lineages. Such a recombination frequency makes it difficult to determine whether specific recombination events explain the recent predominance of El Tor strains.
To overcome these limitations, we imposed a strict molecular clock (1.3×10−3 nucleotide substitutions per SNP site per year) based on a reanalysis of a large El Tor data set.19 Assuming this rate, we estimate that the El Tor 7P strains emerged between 1940 and 1957 (95% highest posterior density), in agreement with previous estimates.19 Similarly, we estimate that the ancestor of the classical strains originated between 1843 and 1860, with divergence of the lineage leading to strain PA1849 occurring between 1797 and 1813, close to the time of the first recognized cholera pandemic, in 1817.21 The combined topology and temporal estimations suggest that the first five cholera pandemics were caused by V. cholerae possessing a common core genome, each representing a clonal reemergence with few genome-scale mutational differences.


The PA1849 genome has a number of unique structural features but differs from strain O395 by only a few hundred SNPs across the entire 4-Mb genome. This suggests that modern V. choleraeevolution has been subjected to substantial selective constraint since the mid-19th century, similar to that of other pathogens that exhibit long-term core genome sequence conservation over a period of centuries (e.g., Yersinia pestis 7 and Mycobacterium leprae 22). One of the striking features of the PA1849 historical genome is the tandem CTX configuration; this could indicate that it was capable of producing infectious CTX virions,23 which potentially conferred greater pathogenic capacity.24 Therefore, the suggestion that the absence of CTX virion production in classical strains may have contributed to their replacement by El Tor24 may be unfounded. However, because there is a V. cholerae strain (B33) with tandem CTX repeats that does not produce replicating virions,25the functional implications of this structure in strain PA1849 cannot be confirmed without experimental expression in a model vibrio strain.
Previous attempts to date the origin of pandemic V. cholerae have returned very different results (e.g., approximately 130 to 50,000 years ago for the classical–El Tor split26). If a constant evolutionary rate of 1.3×10−3 substitutions per SNP site per year were applied across the entire phylogeny, then the common ancestor of all pathogenic V. cholerae (PG, potentially the ancestral strain first adapted to humans) would date to only 430 to 440 years ago. However, a combination of site saturation and recombination means that this date is an underestimate, and the date of PG is more likely to be on a time scale of millennia, predating all historically recognized pandemics and arguing against a post-medieval origin of pathogenic V. cholerae.26 Our analysis therefore suggests that the PG-1 and PG-2 lineages cocirculated in humans and water sources for many centuries and potentially thousands of years before the 19th century pandemics, a finding compatible with the theory that cholera is a disease of the “first epidemiological transition,” during which sedentary agriculture (beginning approximately 10,000 years ago) opened new disease niches.27
Collections of historical pathological specimens are invaluable resources for reconstructing pathogen evolution, yet the study of these collections remains a sensitive topic, because it was common for the bodies of marginalized minorities and the poor to be retained for medical research without consent.28 We hope that by highlighting the intrinsic scientific, historical, and social value of these underappreciated collections, we can help to recognize and protect them in perpetuity.
Supported by Canada Research Chairs from the Natural Sciences and Engineering Research Council of Canada (to Drs. Poinar and Golding), a grant from the Social Sciences and Humanities Research Council of Canada (to Dr. Poinar), a National Health and Medical Research Council Australia Fellowship (to Dr. Holmes), and an Ontario Graduate Scholarship (to Ms. Devault).
Disclosure forms provided by the authors are available with the full text of this article at
This article was published on January 8, 2014, at
We thank Dr. D. Ann Herring, Dr. D. Poinar, Dr. C. Yates, Dr. G. Wright, and current and former members of the McMaster Ancient DNA Centre, McMaster University, Hamilton, ON, Canada.


From the McMaster Ancient DNA Centre (A.M.D., J.M.E., M.K., S.F., K.I.B., H.N.P.), Departments of Anthropology (A.M.D., M.K., K.I.B., H.N.P.), Biology (J.M.E., G.B.G., H.N.P.), and Mathematics and Statistics (D.J.D.E.), and the Michael G. DeGroote Institute for Infectious Disease Research (N.W., D.J.D.E., H.N.P.), McMaster University, Hamilton, ON, and the Dalla Lana School of Public Health, Toronto (D.N.F.) — all in Canada; the Department of Mathematics, Ohio State University, Columbus (J.H.T.); Marie Bashir Institute for Infectious Diseases and Biosecurity Institute, School of Biological Sciences and Sydney Medical School, University of Sydney, Sydney (M.S., E.C.H.); and the College of Physicians of Philadelphia, Mütter Museum, Philadelphia (A.N.D.).
Address reprint requests to Dr. Poinar at the Department of Anthropology, McMaster University, 1280 Main St. W., Hamilton, ON L8S 4L9, Canada or at .

No hay comentarios:

Publicar un comentario