PLoS One. 2013 Oct 17;8(10):e77649. doi: 10.1371/journal.pone.0077649.
Molecular Characterization of Ambiguous Mutations in HIV-1 Polymerase Gene: Implications for Monitoring HIV Infection Status and Drug Resistance.
SourceDivision of Global HIV/AIDS, Center for Global Health, Centers for Disease Control and Prevention (CDC), Atlanta, Georgia, United States of America.
Detection of recent HIV infections is a prerequisite for reliable estimations of transmitted HIV drug resistance (t-HIVDR) and incidence. However, accurately identifying recent HIV infection is challenging due partially to the limitations of current serological tests. Ambiguous nucleotides are newly emerged mutations in quasispecies, and accumulate by time of viral infection. We utilized ambiguous mutations to establish a measurement for detecting recent HIV infection and monitoring early HIVDR development. Ambiguous nucleotides were extracted from HIV-1 pol-gene sequences in the datasets of recent (HIVDR threshold surveys [HIVDR-TS] in 7 countries; n=416) and established infections (1 HIVDR monitoring survey at baseline; n=271). An ambiguous mutation index of 2.04×10(-3) nts/site was detected in HIV-1 recent infections which is equivalent to the HIV-1 substitution rate (2×10(-3) nts/site/year) reported before. However, significantly higher index (14.41×10(-3) nts/site) was revealed with established infections. Using this substitution rate, 75.2% subjects in HIVDR-TS with the exception of the Vietnam dataset and 3.3% those in HIVDR-baseline were classified as recent infection within one year. We also calculated mutation scores at amino acid level at HIVDR sites based on ambiguous or fitted mutations. The overall mutation scores caused by ambiguous mutations increased (0.54×10(-2)3.48×10(-2)/DR-site) whereas those caused by fitted mutations remained stable (7.50-7.89×10(-2)/DR-site) in both recent and established infections, indicating that t-HIVDR exists in drug-naïve populations regardless of infection status in which new HIVDR continues to emerge. Our findings suggest that characterization of ambiguous mutations in HIV may serve as an additional tool to differentiate recent from established infections and to monitor HIVDR emergence.
- [PubMed - in process]
Descriptive statistics of ambiguous mutation in various sequence datasets.
Plot of ambiguous mutations with descriptive statistics was performed using online statistical tool (http://www.physics.csbsju.edu/stats/). Individual country dataset was described for minimal and maximal ranges (short horizontal line at the bottom and top of the box), interquartile range (IQR, at 1st to 3rd quartile, box), median (line inside box), suspected outlier (open dot), and outlier (solid dot). Number in the bracket is the number of sequences from the country, Angola (AO), Botswana (BW), China (CN), Kenya (KE), Malawi (MW), Tanzania (TZ), Vietnam (VN), Nigeria (NG), and Canada (CA) . Numbers with asterisk were calculated without the outlier in dash square box. Figure 1-insert shows the descriptive statistics of ambiguous mutation index in the dataset based on subtype (Table 3).
Distribution of ambiguous mutations and data statistical description of three data subsets.
Sequence frequency distribution with number of ambiguous mutations (AMs) was plotted by subset: (A). Threshold (n=346), (B). Baseline (n=271), and (C). Vietnam (VN) (n=68); and the statistical description of the 3 data subsets was plot (D) by number and index of ambiguous mutations using the same method as described in Figure 1. The percentage in A-C indicated recent infections in a dataset classified by having ≤2 AMs per sequence (indicated by dash line).
Proportional distribution of mutated and ambiguous mutated amino acids at HIVDR sites.
The mutation score at each of the drug resistance sites  was proportionally calculated with the mutated and ambiguous mutated amino acids for all the sequences in the datasets. A mutated or ambiguous mutated amino acid was defined as an amino acid had mutated from a wild type to a pure non-synonymous mutation or an ambiguous mutation in the mixture allele. The scores were summed by 1 for a pure amino acid mutation and 0.5 for an ambiguous amino acid mutation, and then converted to percentages against the total number of wild-type amino acids at the site. The distribution of drug resistance mutation scores was plot by the dataset of Threshold (bottom panel), Vietnam (VN, central panel) and Baseline (top panel). The x-axis is the wild-type amino acids at drug resistance sites; the y-axis is the drug resistance mutation score (%). The sites with obvious score changes across the 3 datasets from bottom to top panel were labeled by up-triangle (increased), rhombus (remained), and down- triangle (decreased). Amino acids of protease gene (Prt) were top-dash lined, and of reverse transcriptase gene (RT) were top-solid lined.
Index of mutated and ambiguous mutated amino acids at HIVDR sites by data subset.
The total score of drug resistance mutations caused by pure mutated amino acids or by ambiguous mutated amino acids was calculated separately for each of the data subsets, and divided by the number of total drug resistance sites  to obtain the index of mutated or ambiguous mutated amino acids by subset. The definition and score calculation of pure mutated and ambiguous mutated amino acids were described in Figure 3.