Cloud computing paradigms for pleasingly parallel biomedical applications

Thilina Gunarathne, Tak Lon Wu, Jong Youl Choi, Seung Hee Bae, Judy Qiu

Research output: Contribution to journalArticlepeer-review

34 Scopus citations

Abstract

Cloud computing offers exciting new approaches for scientific computing that leverage major commercial players' hardware and software investments in large-scale data centers. Loosely coupled problems are very important in many scientific fields, and with the ongoing move towards data-intensive computing, they are on the rise. There exist several different approaches to leveraging clouds and cloud-oriented data processing frameworks to perform pleasingly parallel (also called embarrassingly parallel) computations. In this paper, we present three pleasingly parallel biomedical applications: (i) assembly of genome fragments; (ii) sequence alignment and similarity search; and (iii) dimension reduction in the analysis of chemical structures, which are implemented utilizing a cloud infrastructure service-based utility computing models of Amazon Web Services (Inc., Seattle, WA, USA) and Microsoft Windows Azure (Microsoft Corp., Redmond, WA, USA) as well as utilizing MapReduce-based data processing frameworks Apache Hadoop (Apache Software Foundation, Los Angeles, CA, USA) and Microsoft DryadLINQ. We review and compare each of these frameworks, performing a comparative study among them based on performance, cost, and usability. High latency, eventually consistent cloud infrastructure service-based frameworks that rely on off-the-node cloud storage were able to exhibit performance efficiencies and scalability comparable to the MapReduce-based frameworks with local disk-based storage for the applications considered. In this paper, we also analyze variations in cost among the different platform choices (e.g., Elastic Compute Cloud instance types), highlighting the importance of selecting an appropriate platform based on the nature of the computation.

Original languageEnglish
Pages (from-to)2338-2354
Number of pages17
JournalConcurrency and Computation: Practice and Experience
Volume23
Issue number17
DOIs
StatePublished - Dec 10 2011
Externally publishedYes

Keywords

  • bioinformatics
  • cloud technology
  • map reduce

Fingerprint

Dive into the research topics of 'Cloud computing paradigms for pleasingly parallel biomedical applications'. Together they form a unique fingerprint.

Cite this