Learning to Scale the Summit: AI for Science on a Leadership Supercomputer

Wayne Joubert, Bronson Messer, Philip C. Roth, Antigoni Georgiadou, Justin Lietz, Markus Eisenbach, Junqi Yin

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

The Summit system at Oak Ridge National Lab-oratory (ORNL) has been the world's top AI for science su-percomputer for several years, ranked world's fastest computer at its 2018 launch and currently top system in the US and #2 on the TOP5OO list. Summit's purposeful design to handle both conventional modeling and simulation science and emerging AI workloads has made it a leading destination for AI-powered computational science. We report here on AI for science usage on Summit near the midpoint of its lifespan. We review AI usage across the many science projects that have used Summit. We then examine in detail a set of applications scaling AI to full system as well as projects implementing AI-coordinated science discovery workflows on Summit. Finally, we offer some observations regarding the future of advancing scientific knowledge and understanding via AI, especially in the context of leadership-class scientific computing.

Original languageEnglish
Title of host publicationProceedings - 2022 IEEE 36th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2022
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1246-1255
Number of pages10
ISBN (Electronic)9781665497473
DOIs
StatePublished - 2022
Event36th IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2022 - Virtual, Online, France
Duration: May 30 2022Jun 3 2022

Publication series

NameProceedings - 2022 IEEE 36th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2022

Conference

Conference36th IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2022
Country/TerritoryFrance
CityVirtual, Online
Period05/30/2206/3/22

Funding

ACKNOWLEDGMENT This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. This manuscript has been authored in part by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the US Department of Energy (DOE). The US government retains and the publisher, by accepting the article for publication, acknowledges that the US government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for US government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan).

Keywords

  • AI
  • HPC
  • artificial intelligence
  • high performance computing
  • machine learning

Fingerprint

Dive into the research topics of 'Learning to Scale the Summit: AI for Science on a Leadership Supercomputer'. Together they form a unique fingerprint.

Cite this