TY - JOUR
T1 - Cloud computing paradigms for pleasingly parallel biomedical applications
AU - Gunarathne, Thilina
AU - Wu, Tak Lon
AU - Choi, Jong Youl
AU - Bae, Seung Hee
AU - Qiu, Judy
PY - 2011/12/10
Y1 - 2011/12/10
N2 - Cloud computing offers exciting new approaches for scientific computing that leverage major commercial players' hardware and software investments in large-scale data centers. Loosely coupled problems are very important in many scientific fields, and with the ongoing move towards data-intensive computing, they are on the rise. There exist several different approaches to leveraging clouds and cloud-oriented data processing frameworks to perform pleasingly parallel (also called embarrassingly parallel) computations. In this paper, we present three pleasingly parallel biomedical applications: (i) assembly of genome fragments; (ii) sequence alignment and similarity search; and (iii) dimension reduction in the analysis of chemical structures, which are implemented utilizing a cloud infrastructure service-based utility computing models of Amazon Web Services (Inc., Seattle, WA, USA) and Microsoft Windows Azure (Microsoft Corp., Redmond, WA, USA) as well as utilizing MapReduce-based data processing frameworks Apache Hadoop (Apache Software Foundation, Los Angeles, CA, USA) and Microsoft DryadLINQ. We review and compare each of these frameworks, performing a comparative study among them based on performance, cost, and usability. High latency, eventually consistent cloud infrastructure service-based frameworks that rely on off-the-node cloud storage were able to exhibit performance efficiencies and scalability comparable to the MapReduce-based frameworks with local disk-based storage for the applications considered. In this paper, we also analyze variations in cost among the different platform choices (e.g., Elastic Compute Cloud instance types), highlighting the importance of selecting an appropriate platform based on the nature of the computation.
AB - Cloud computing offers exciting new approaches for scientific computing that leverage major commercial players' hardware and software investments in large-scale data centers. Loosely coupled problems are very important in many scientific fields, and with the ongoing move towards data-intensive computing, they are on the rise. There exist several different approaches to leveraging clouds and cloud-oriented data processing frameworks to perform pleasingly parallel (also called embarrassingly parallel) computations. In this paper, we present three pleasingly parallel biomedical applications: (i) assembly of genome fragments; (ii) sequence alignment and similarity search; and (iii) dimension reduction in the analysis of chemical structures, which are implemented utilizing a cloud infrastructure service-based utility computing models of Amazon Web Services (Inc., Seattle, WA, USA) and Microsoft Windows Azure (Microsoft Corp., Redmond, WA, USA) as well as utilizing MapReduce-based data processing frameworks Apache Hadoop (Apache Software Foundation, Los Angeles, CA, USA) and Microsoft DryadLINQ. We review and compare each of these frameworks, performing a comparative study among them based on performance, cost, and usability. High latency, eventually consistent cloud infrastructure service-based frameworks that rely on off-the-node cloud storage were able to exhibit performance efficiencies and scalability comparable to the MapReduce-based frameworks with local disk-based storage for the applications considered. In this paper, we also analyze variations in cost among the different platform choices (e.g., Elastic Compute Cloud instance types), highlighting the importance of selecting an appropriate platform based on the nature of the computation.
KW - bioinformatics
KW - cloud technology
KW - map reduce
UR - http://www.scopus.com/inward/record.url?scp=80054919303&partnerID=8YFLogxK
U2 - 10.1002/cpe.1780
DO - 10.1002/cpe.1780
M3 - Article
AN - SCOPUS:80054919303
SN - 1532-0626
VL - 23
SP - 2338
EP - 2354
JO - Concurrency and Computation: Practice and Experience
JF - Concurrency and Computation: Practice and Experience
IS - 17
ER -