New tools to determine copy number variations raise questions about reproducibility
Copy number variations (CNVs) - large segments of DNA that have been duplicated or deleted - play a role in disease susceptibility and drug response. New tools exist that examine the prevalence of CNVs in the protein-coding part of the genome, but their accuracy remains unknown. A new study from NHGRI investigators looks at the reproducibility of the results from the most current tools and finds that further improvements are necessary. The research was published in the August 8 edition of Genome Medicine. Read more
Assessing the reproducibility of exome copy number variations predictions
Genome Medicine20168:82
DOI: 10.1186/s13073-016-0336-6
© COPYRIGHT NOTICE. 2016
Received: 31 March 2016
Accepted: 13 July 2016
Published: 8 August 2016
Abstract
Background
Reproducibility is receiving increased attention across many domains of science and genomics is no exception. Efforts to identify copy number variations (CNVs) from exome sequence (ES) data have been increasing. Many algorithms have been published to discover CNVs from exomes and a major challenge is the reproducibility in other datasets. Here we test exome CNV calling reproducibility under three conditions: data generated by different sequencing centers; varying sample sizes; and varying capture methodology.
Methods
Four CNV tools were tested: eXome Hidden Markov Model (XHMM), Copy Number Inference From Exome Reads (CoNIFER), EXCAVATOR, and Copy Number Analysis for Targeted Resequencing (CONTRA). To examine the reproducibility, we ran the callers on four datasets, varying sample sizes of N = 10, 30, 75, 100, 300, and data with different capture methodology. We examined the false negative (FN) calls and false positive (FP) calls for potential limitations of the CNV callers. The positive predictive value (PPV) was measured by checking the CNV call concordance against single nucleotide polymorphism array.
Results
Using independently generated datasets, we examined the PPV for each dataset and observed wide range of PPVs. The PPV values were highly data dependent (p <0.001). For the sample sizes and capture method analyses, we tested the callers in triplicates. Both analyses resulted in wide ranges of PPVs, even for the same test. Interestingly, negative correlations between the PPV and the sample sizes were observed for CoNIFER (ρ = –0.80). Further examination of FN calls showed that 44 % of these were missed by all callers and were attributed to the CNV size (46 % spanned ≤3 exons). Overlap of the FP calls showed that FPs were unique to each caller, indicative of algorithm dependency.
Conclusions
Our results demonstrate that further improvements in CNV callers are necessary to improve reproducibility and to include wider spectrum of CNVs (including the small CNVs). These CNV callers should be evaluated on multiple independent, heterogeneously generated datasets of varying size to increase robustness and utility. These approaches to the evaluation of exome CNV are essential to support wide utility and applicability of CNV discovery in exome studies.
No hay comentarios:
Publicar un comentario