Pre-exascale accelerated application development: The ORNL Summit experience

L. Luo, T. P. Straatsma, L. E.Aguilar Suarez, R. Broer, D. Bykov, E. F. D'Azevedo, S. S. Faraji, K. C. Gottiparthi, C. De Graaf, J. A. Harris, R. W.A. Havenith, H. J.Aa Jensen, W. Joubert, R. K. Kathir, J. Larkin, Y. W. Li, D. I. Lyakh, O. E.B. Messer, M. R. Norman, J. C. OefeleinR. Sankaran, A. F. Tillack, A. L. Barnes, L. Visscher, J. C. Wells, M. Wibowo

Research output: Contribution to journalArticlepeer-review

18 Scopus citations

Abstract

High-performance computing (HPC) increasingly relies on heterogeneous architectures to achieve higher performance. In the Oak Ridge Leadership Facility (OLCF), Oak Ridge, TN, USA, this trend continues as its latest supercomputer, Summit, entered production in early 2019. The combination of IBM POWER9 CPU and NVIDIA V100 GPU, along with a fast NVLink2 interconnect and other latest technologies, pushes system performance to a new height and breaks the exascale barrier by certain measures. Due to Summit's powerful GPUs and much higher GPU-CPU ratio, offloading to accelerators becomes a requirement for any application, which intends to effectively use the system. To facilitate navigating a complex landscape of competing heterogeneous architectures, a collection of applications from a wide spectrum of scientific domains is selected for early adoption on Summit. In this article, the experience and lessons learned are summarized, in the hope of providing useful guidance to address new programming challenges, such as scalability, performance portability, and software maintainability, for future application development efforts on heterogeneous HPC systems.

Original languageEnglish
Article number8960361
JournalIBM Journal of Research and Development
Volume64
Issue number3-4
DOIs
StatePublished - May 1 2020

Funding

The research projects described in this article used resources of the OLCF, which is a DOE Office of Science User Facility supported under Contract DE-AC05-00OR22725. Models in E3SM-MMF were obtained from the E3SM project, sponsored by the U.S. DOE, Office of Science, Office of Biological and Environmental Research. Development work on GronOR is part of the (Shell-NWO) research program of the Foundation for Fundamental Research on Matter, which is part of the Netherlands Organization for Scientific Research (NWO), and part of a European Joint Doctorate (EJD) in Theoretical Chemistry and Computational Modelling (TCCM), which has been financed under the framework of the Innovative Training Networks (ITN) of the Marie Skodowska-Curie Actions (ITN-EJD-642294-TCCM). FLASH was developed, in part, by the DOE NNSA ASC-and DOE Office of Science ASCR-supported Flash Center for Computational Science at the University of Chicago. Additional support for FLASH development was provided by the ECP (17-SC-20-SC), a collaborative effort of the U.S. DOE Office of Science and the NNSA.

FundersFunder number
Netherlands Organization for Scientific Research
Office of Biological and Environmental Research
U.S. DOE
U.S. Department of Energy
Office of ScienceDE-AC05-00OR22725
National Nuclear Security Administration
University of Chicago17-SC-20-SC
H2020 Marie Skłodowska-Curie ActionsITN-EJD-642294-TCCM
Nederlandse Organisatie voor Wetenschappelijk Onderzoek

    Fingerprint

    Dive into the research topics of 'Pre-exascale accelerated application development: The ORNL Summit experience'. Together they form a unique fingerprint.

    Cite this