TY - GEN
T1 - VGrADS
T2 - Conference on High Performance Computing Networking, Storage and Analysis, SC '09
AU - Ramakrishnan, Lavanya
AU - Koelbel, Charles
AU - Kee, Yang Suk
AU - Wolski, Rich
AU - Nurmi, Daniel
AU - Gannon, Dennis
AU - Obertelli, Graziano
AU - YarKhan, Asim
AU - Mandal, Anirban
AU - Huang, T. Mark
AU - Thyagaraja, Kiran
AU - Zagorodnov, Dmitrii
PY - 2009
Y1 - 2009
N2 - Today's scientific workflows use distributed heterogeneous resources through diverse grid and cloud interfaces that are often hard to program. In addition, especially for time-sensitive critical applications, predictable quality of service is necessary across these distributed resources. VGrADS' virtual grid execution system (vgES) provides an uniform qualitative resource abstraction over grid and cloud systems. We apply vgES for scheduling a set of deadline sensitive weather forecasting workflows. Specifically, this paper reports on our experiences with (1) virtualized reservations for batchqueue systems, (2) coordinated usage of TeraGrid (batch queue), Amazon EC2 (cloud), our own clusters (batch queue) and Eucalyptus (cloud) resources, and (3) fault tolerance through automated task replication. The combined effect of these techniques was to enable a new workflow planning method to balance performance, reliability and cost considerations. The results point toward improved resource selection and execution management support for a variety of e-Science applications over grids and cloud systems.
AB - Today's scientific workflows use distributed heterogeneous resources through diverse grid and cloud interfaces that are often hard to program. In addition, especially for time-sensitive critical applications, predictable quality of service is necessary across these distributed resources. VGrADS' virtual grid execution system (vgES) provides an uniform qualitative resource abstraction over grid and cloud systems. We apply vgES for scheduling a set of deadline sensitive weather forecasting workflows. Specifically, this paper reports on our experiences with (1) virtualized reservations for batchqueue systems, (2) coordinated usage of TeraGrid (batch queue), Amazon EC2 (cloud), our own clusters (batch queue) and Eucalyptus (cloud) resources, and (3) fault tolerance through automated task replication. The combined effect of these techniques was to enable a new workflow planning method to balance performance, reliability and cost considerations. The results point toward improved resource selection and execution management support for a variety of e-Science applications over grids and cloud systems.
UR - http://www.scopus.com/inward/record.url?scp=74049149936&partnerID=8YFLogxK
U2 - 10.1145/1654059.1654107
DO - 10.1145/1654059.1654107
M3 - Conference contribution
AN - SCOPUS:74049149936
SN - 9781605587448
T3 - Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC '09
BT - Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC '09
Y2 - 14 November 2009 through 20 November 2009
ER -