Landrush: Rethinking In-Situ Analysis for GPGPU Workflows

Anshuman Goswami, Yuan Tian, Karsten Schwan, Fang Zheng, Jeffrey Young, Matthew Wolf, Greg Eisenhauer, Scott Klasky

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

14 Scopus citations

Abstract

In-situ analysis on the output data of scientific simulations has been made necessary by ever-growing output data volumes and increasing costs of data movement as supercomputing is moving towards exascale. With hardware accelerators like GPUs becoming increasingly common in high end machines, new opportunities arise to co-locate scientific simulations and online analysis performed on the scientific data generated by the simulations. However, the asynchronous nature of GPGPU programming models and the limited context-switching capabilities on the GPU pose challenges to co-locating the scientific simulation and analysis on the same GPU. This paper dives deeper into these challenges to understand how best to co-locate analysis with scientific simulations on the GPUs in HPC clusters. Specifically, our 'Landrush' approach to GPU sharing proposes a solution that utilizes idle cycles on the GPU to provide an improved time-to-answer, that is, the total time to run the scientific simulation and analysis of the generated data. Landrush is demonstrated with experimental results obtained from leadership high-end applications on ORNL's Titan supercomputer, which show that (i) GPU-based scientific simulations have varying degrees of idle cycles to afford useful analysis task co-location, and (ii) the inability to context switch on the GPU at instruction granularity can be overcome by careful control of the analysis kernel launches and software-controlled early completion of analysis kernel executions. Results show that Landrush is superior in terms of time-to-answer compared to serially running simulations followed by analysis or by relying on the GPU driver and hardwired thread dispatcher to run analysis concurrently on a single GPU.

Original languageEnglish
Title of host publicationProceedings - 2016 16th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2016
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages32-41
Number of pages10
ISBN (Electronic)9781509024520
DOIs
StatePublished - Jul 18 2016
Event16th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2016 - Cartagena, Colombia
Duration: May 16 2016May 19 2016

Publication series

NameProceedings - 2016 16th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2016

Conference

Conference16th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2016
Country/TerritoryColombia
CityCartagena
Period05/16/1605/19/16

Keywords

  • GPU workflow
  • In-situ analysis
  • instrumentation
  • runtime scheduler

Fingerprint

Dive into the research topics of 'Landrush: Rethinking In-Situ Analysis for GPGPU Workflows'. Together they form a unique fingerprint.

Cite this