lunes, 23 de junio de 2014

JAMA Network | JAMA | Assessing Value in Biomedical Research: The PQRST of Appraisal and Reward

Impact of Biomedical Research: Reproducibility & Translation

CDC paper: Assessing value in biomedical research: The PQRST of appraisal and reward

John P. A. Ioannidis & Muin J. Khoury, JAMA, June 9, 2014

CDC Public Health Grand Rounds, "Measuring science impact",

You Tube, Jan 27

CDC paper: Increasing value and reducing waste in research design, conduct, and analysis.

Ioannidis J, et al. Lancet, January 2014

Policy: NIH plans to enhance reproducibility,

Francis S. Collins & Lawrence A. Tabak, Nature News, Jan 27

Why most published research findings are false.

John P. A. Ioannidis PLoS Medicine (2005)

CDC paper: Most published research findings are false, but a little replication goes a long way.

Mooneshinghe R, et al. PLoS Medicine (2007)

NIH presses journals to focus on reproducibility of studies,

by Paul Basken, Chronicle of Higher Education Jun 6

FULL-TEXT and MORE ►

JAMA Network | JAMA | Assessing Value in Biomedical Research: The PQRST of Appraisal and Reward

Assessing Value in Biomedical ResearchThe PQRST of Appraisal and Reward FREE ONLINE FIRST

John P. A. Ioannidis, MD, DSc1,2,3; Muin J. Khoury, MD, PhD4,5

[+] Author Affiliations

JAMA. Published online June 09, 2014. doi:10.1001/jama.2014.6932

Text Size: A A A

Article

Tables

References

Production of scientific work is regulated by reward systems. Scientists are typically rewarded for publishing articles, obtaining grants, and claiming novel, significant results. However, emphasis on publication can lead to least publishable units, authorship inflation, and potentially irreproducible results. Emphasis on claiming significant results leads to lack of publication of nonsignificant high-quality studies or to massaging data to obtain “positive” results. Emphasis on novelty leaves no incentives to spend resources on replicating prior findings to probe their correctness. Data owners have a publishing advantage without incentives to share with competitor scientists.

In the past, grapevine knowledge among the few knowledgeable experts allowed discerning the good work from the waste. But currently the noise-to-signal ratio is tremendous, with the proliferation of technologies (such as genomics) and journals. Thousands of new journals publish work for a fee, regardless of the quality of the work.1 To change the tide, the criteria by which scientists and their teams are rewarded for their efforts by agencies that fund them and institutions that host them should be revisited,2 aligning criteria with the desired outcomes: research that is productive, high-quality, reproducible, shareable, and translatable—or PQRST for short.

Productivity metrics should reward high-influence science rather than least publishable units and decrease publication bias against negative results. Instead of counting each and every publishable unit, even now several major universities ask only for the top papers from each candidate for appointment or promotion. However, the process can be standardized. Citation databases such as ISI Web of Knowledge/Essential Science Indicators automatically identify scientific fields and the x% top-cited articles in each scientific field and each year (x can be set at 1%, 10%, or other desirable percentage). Authorship contributions should also be considered when allocating credit for multiauthored papers using standard formulas such as harmonic adjustments.3

Another useful metric is the proportion of published scientific work emanating from a research project. Funders can keep records of the publication of funded projects. Even for clinical trials funded by federal resources, a substantial proportion remain unpublished several years after their completion.4 For other types of research, nonpublication is likely to be even more frequent. Should investigators receive more funds if they used previous funds without publishing anything? In fields such as clinical research of interventions, public study registration has become widely accepted (and even enforced by regulatory agencies) and the concept of registration can be extended to other fields when appropriate.5 Registration allows documenting whether research studies result in published reports within a reasonable time of completion. Registration of protocols and analysis plans can also help evaluate selective reporting, ie, whether only some outcomes or analyses have been published and whether analyses deviate from promised plans.

For quality assessments, focusing on top-cited articles in assessing productivity already captures some aspects of quality, but citations and quality are not perfectly correlated. Funders can ask that protocols and studies fulfill specific quality standards in their design, implementation, and analysis. The National Institutes of Health is already moving in this direction with implementation of checklists.6 There are also numerous reporting standards for diverse research fields, as summarized by the EQUATOR initiative.7 Each field should agree on what quality features are essential. Nevertheless, difficulties in rating quality objectively should not be underestimated. Under pressure to comply, investigators may simply check off requested items while ignoring other fundamental issues relevant to the specific study. Quality assessments may focus on very few, uncontroversial, and easily verifiable study aspects. These assessments also may be used to promote improvement of that particular aspect in the whole field, eg, routine use of randomization and blinded assessments in preclinical animal studies.8

Reproducibility can range from repeating analyses with raw data to independent replication using different materials or study participants or even study designs different (more rigorous) than the original study. Some types of reproducibility checks are easy. Others are prohibitively difficult, eg, performing another similar trial with new participants and 10 years of follow-up to independently replicate the results of a clinical study. This should be taken into account in deciding whether replication and reproducibility checks should be requested routinely (eg, when easy and inexpensive to do) or under select circumstances (eg, only for the most influential papers, if difficult and expensive to perform).

Sharing can also be measured. For each scientist, it is possible to assess how many papers are accompanied by shareable data, materials, or protocols. Indexing databases such as PubMed and ClinicalTrials.gov could note in each new publication record whether such shared resources are available. Many funders and journals have already made sharing practices mandatory for particular research types.

As for translation, much excellent research has no recognizable translational application or benefit. Scientific influence, if any, often becomes manifest many years after the initial discovery. However, translational performance is relevant in research with direct applied aspirations, and this covers most of preclinical investigation and clinical medicine. For example, for preclinical studies of interventions, the translational milestone may be successful evaluation of the same intervention in humans; for clinical research, it may be licensing or approval for clinical use.

Given current resources, some of these indices can be easily evaluated for all scientists and for all their work. Other metrics and investigators need more focused appraisals; eg, it is impossible to perform reproducibility checks on every single published article. It is much easier to focus on the most influential articles, which are the ones considered in item P (eg, the top-cited 1%). A small budget is needed to reproduce articles that attain top scientific influence and thus have a major effect on the course of science. Many such articles already include replications, eg, currently all most-cited genome association studies include by default extensive replication in independent populations. Assessments of quality and translational influence that lack all-encompassing automated databases may also need to focus on the most influential work. Proportion of published work and assessments of sharing practices can relatively easily be automated science-wide by funders, registries, indexing online libraries, or other resources.

The Table illustrates some suggestions for how to potentially operationalize these principles. The suggestions are not prescriptive but may offer ideas to funders and other stakeholders for next practical steps. The exact combination or weighting of indices will require discussion and consensus among stakeholders. Selected reward system choices should also factor the potential for gaming any appraisal and reward system. Potential untoward consequences of gaming should be anticipated, minimized, and monitored. For example, scientists may acknowledge funding from specific grants for entirely unrelated published work if their career depends on demonstrating that funding resulted in publications. If so, funders should verify that the funded work has indeed been published. Or, if reward is given only for top-cited articles, networking between investigators and journals may create a citation factory of mediocre articles that mutually propel themselves toward the top-cited range. Some of the other indices will correct this; eg, observational nutritional epidemiology has some of the most-cited papers across all science, but much of this work has failed replication.