VGrADS: Enabling e-Science workflows on grids and clouds with fault tolerance

Lavanya Ramakrishnan, Charles Koelbel, Yang Suk Kee, Rich Wolski, Daniel Nurmi, Dennis Gannon, Graziano Obertelli, Asim YarKhan, Anirban Mandal, T. Mark Huang, Kiran Thyagaraja, Dmitrii Zagorodnov

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

60 Scopus citations

Abstract

Today's scientific workflows use distributed heterogeneous resources through diverse grid and cloud interfaces that are often hard to program. In addition, especially for time-sensitive critical applications, predictable quality of service is necessary across these distributed resources. VGrADS' virtual grid execution system (vgES) provides an uniform qualitative resource abstraction over grid and cloud systems. We apply vgES for scheduling a set of deadline sensitive weather forecasting workflows. Specifically, this paper reports on our experiences with (1) virtualized reservations for batchqueue systems, (2) coordinated usage of TeraGrid (batch queue), Amazon EC2 (cloud), our own clusters (batch queue) and Eucalyptus (cloud) resources, and (3) fault tolerance through automated task replication. The combined effect of these techniques was to enable a new workflow planning method to balance performance, reliability and cost considerations. The results point toward improved resource selection and execution management support for a variety of e-Science applications over grids and cloud systems.

Original languageEnglish
Title of host publicationProceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC '09
DOIs
StatePublished - 2009
Externally publishedYes
EventConference on High Performance Computing Networking, Storage and Analysis, SC '09 - Portland, OR, United States
Duration: Nov 14 2009Nov 20 2009

Publication series

NameProceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC '09

Conference

ConferenceConference on High Performance Computing Networking, Storage and Analysis, SC '09
Country/TerritoryUnited States
CityPortland, OR
Period11/14/0911/20/09

Fingerprint

Dive into the research topics of 'VGrADS: Enabling e-Science workflows on grids and clouds with fault tolerance'. Together they form a unique fingerprint.

Cite this