Genome research creating data that's too big for IT -- GCN
Genome research creating data that's too big for IT
The biomedical research community is generating data at rate that has simply overcome the ability of traditional IT tools to make the best scientific uses of it.
At least that is the gist of a letter this month to the cancer research community from George Komatsoulis, chief information officer of the National Cancer Institute, which is soliciting best practices on how to overcome the challenge.
Data generated by genome sequencing and the use of large-scale imaging technologies are "breaking the standard model by which researchers manage and analyze data," he wrote in a recent blog post.
In an effort to head off the trend, the NCI has asked all of its grantees this month for input on a set of pilot projects that would test the feasibility of setting up a "cancer knowledge cloud" that would equip researchers with the computational tools they need to meet the big data demands of big science.
By combining storage repositories and computing power in the cloud, researchers would overcome some of the limitations classic data management practices are putting on the NCI's biggest goals: building The Cancer Genome Atlas (TCGA) data set.
At the conclusion of the project in 2014, the TCGA project is expected to generate 2.5 petabytes, or 21 million gigabits, of data. Even with a 10 gigabit/second link, Komatsoulis estimated it would take 23 days (2 million seconds) to download the dataset. A faster solution, he pointed out, would be to ship a disk array via the U.S. Postal Service.