Job and data clustering for aggregate use of multiple production cyberinfrastructures

Ketan Maheshwari, Allan Espinosa, Daniel S. Katz, Michael Wilde, Zhao Zhang, Ian Foster, Scott Callaghan, Phillip Maechling

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Scopus citations

Abstract

In this paper, we address the challenges of reducing the time-to-solution of the data intensive earthquake simulation workflow "CyberShake" by supplementing the high-performance parallel computing (HPC) resources on which it typically runs with distributed, heterogeneous resources that can be obtained opportunistically from grids and clouds. We seek to minimize time to solution by maximizing the amount of work that can be efficiently done on the distributed resources. We identify data movement as the main bottleneck in effectively utilizing the combined local and distributed resources. We address this by analyzing the I/O characteristics of the application, processor acquisition rate (from a pilot-job service), and the data movement throughput of the infrastructure. With these factors in mind, we explore a combination of strategies including partitioning of computation (over HPC and distributed resources) and job clustering. We validate our approach with a theoretical study and with preliminary measurements on the Ranger HPC system and distributed Open Science Grid resources. More complete performance results will be presented in the final submission of this paper.

Original languageEnglish
Title of host publicationDIDC'12 - 5th International Workshop on Data-Intensive Distributed Computing
Pages3-11
Number of pages9
DOIs
StatePublished - 2012
Externally publishedYes
Event5th International Workshop on Data-Intensive Distributed Computing, DIDC'12 - Delft, Netherlands
Duration: Jun 19 2012Jun 19 2012

Publication series

NameDIDC'12 - 5th International Workshop on Data-Intensive Distributed Computing

Conference

Conference5th International Workshop on Data-Intensive Distributed Computing, DIDC'12
Country/TerritoryNetherlands
CityDelft
Period06/19/1206/19/12

Keywords

  • Hpc
  • Parallel
  • Scec
  • Swift

Fingerprint

Dive into the research topics of 'Job and data clustering for aggregate use of multiple production cyberinfrastructures'. Together they form a unique fingerprint.

Cite this