Abstract
Background: Ongoing advancements in cloud computing provide novel opportunities in scientific computing, especially for distributed workflows. Modern web browsers can now be used as high-performance workstations for querying, processing, and visualizing genomics' " Big Data" from sources like The Cancer Genome Atlas (TCGA) and the International Cancer Genome Consortium (ICGC) without local software installation or configuration. The design of QMachine (QM) was driven by the opportunity to use this pervasive computing model in the context of the Web of Linked Data in Biomedicine.Results: QM is an open-sourced, publicly available web service that acts as a messaging system for posting tasks and retrieving results over HTTP. The illustrative application described here distributes the analyses of 20 Streptococcus pneumoniae genomes for shared suffixes. Because all analytical and data retrieval tasks are executed by volunteer machines, few server resources are required. Any modern web browser can submit those tasks and/or volunteer to execute them without installing any extra plugins or programs. A client library provides high-level distribution templates including MapReduce. This stark departure from the current reliance on expensive server hardware running " download and install" software has already gathered substantial community interest, as QM received more than 2.2 million API calls from 87 countries in 12 months.Conclusions: QM was found adequate to deliver the sort of scalable bioinformatics solutions that computation- and data-intensive workflows require. Paradoxically, the sandboxed execution of code by web browsers was also found to enable them, as compute nodes, to address critical privacy concerns that characterize biomedical environments.
Original language | English |
---|---|
Article number | 176 |
Journal | BMC Bioinformatics |
Volume | 15 |
Issue number | 1 |
DOIs | |
State | Published - Jun 9 2014 |
Externally published | Yes |
Funding
This work was supported in part by the Center for Clinical and Translational Sciences of the University of Alabama at Birmingham under contract no. 5UL1RR025777-03 from NIH National Center for Research Resources. This work was also supported in part by an NCI T32 Trainee Grant at Rice University under contract no. 5T32CA096520-05.
Funders | Funder number |
---|---|
Center for Clinical and Translational Sciences of the University of Alabama | 5UL1RR025777-03 |
National Institutes of Health | |
National Cancer Institute | T32CA096520 |
National Center for Research Resources | |
Rice University | 5T32CA096520-05 |
Keywords
- Cloud computing
- Crowdsourcing
- Distributed computing
- JavaScript
- MapReduce
- PaaS
- Sequence analysis
- Web service