Hybrid cloud and cluster computing paradigms for life science applications

Judy Qiu, Jaliya Ekanayake, Thilina Gunarathne, Jong Y. Choi, Seung Hee Bae, Hui Li, Bingjing Zhang, Tak Lon Wu, Yang Ruan, Saliya Ekanayake, Adam Hughes, Geoffrey Fox

Research output: Contribution to journalArticlepeer-review

48 Scopus citations

Abstract

Background: Clouds and MapReduce have shown themselves to be a broadly useful approach to scientific computing especially for parallel data intensive applications. However they have limited applicability to some areas such as data mining because MapReduce has poor performance on problems with an iterative structure present in the linear algebra that underlies much data analysis. Such problems can be run efficiently on clusters using MPI leading to a hybrid cloud and cluster environment. This motivates the design and implementation of an open source Iterative MapReduce system Twister.Results: Comparisons of Amazon, Azure, and traditional Linux and Windows environments on common applications have shown encouraging performance and usability comparisons in several important non iterative cases. These are linked to MPI applications for final stages of the data analysis. Further we have released the open source Twister Iterative MapReduce and benchmarked it against basic MapReduce (Hadoop) and MPI in information retrieval and life sciences applications.Conclusions: The hybrid cloud (MapReduce) and cluster (MPI) approach offers an attractive production environment while Twister promises a uniform programming environment for many Life Sciences applications.Methods: We used commercial clouds Amazon and Azure and the NSF resource FutureGrid to perform detailed comparisons and evaluations of different approaches to data intensive computing. Several applications were developed in MPI, MapReduce and Twister in these different environments.

Original languageEnglish
Article numberS3
JournalBMC Bioinformatics
Volume11
Issue numberSUPPL. 12
DOIs
StatePublished - Dec 21 2010
Externally publishedYes

Funding

MPI: Message Passing Interface; NSF: National Science Fundation; UC Santa Barbara HPC Research: University of California Santa Barbara High Performance Computing Research; OCI: Office of Cyberinfrastructure; DOE: Department of Energy; EU: European Union; VM: Virtual Machine; HPC: High Performance Computing; DNA: Deoxyribonucleic Acid; BLAST: Basic Local Alignment Search Tool; MDS: Multidimensional Scaling; JVM: Java Virtual Machine We appreciate Microsoft for their technical support. This work was made possible using the computing use grant provided by Amazon Web Services which is titled “Proof of concepts linking FutureGrid users to AWS”. This work is partially funded by Microsoft “CRMC” grant and NIH Grant Number RC2HG005806-02. This document was developed with support from the National Science Foundation (NSF) under Grant No. 0910812 to Indiana University for “FutureGrid: An Experimental, High-Performance Grid Test-bed.” Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessary This article has been published as part of BMC Bioinformatics Volume 11 Supplement 12, 2010: Proceedings of the 11th Annual Bioinformatics Open Source Conference (BOSC) 2010. The full contents of the supplement are available online at http://www.biomedcentral.com/1471-2105/11?issue=S12.

FundersFunder number
Java Virtual Machine
University of California Santa Barbara High Performance Computing Research
National Science Foundation0910812
National Institutes of HealthRC2HG005806-02
U.S. Department of Energy
Microsoft
Indiana University
University of California, Santa Barbara
Amazon Web Services
European Commission

    Fingerprint

    Dive into the research topics of 'Hybrid cloud and cluster computing paradigms for life science applications'. Together they form a unique fingerprint.

    Cite this