Exascale deep learning for climate analytics

Thorsten Kurth, Sean Treichler, Joshua Romero, Mayur Mudigonda, Nathan Luehr, Everett Phillips, Ankur Mahesh, Michael Matheson, Jack Deslippe, Massimiliano Fatica, Prabhat Prabhat, Michael Houston

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

225 Scopus citations

Abstract

We extract pixel-level masks of extreme weather patterns using variants of Tiramisu and DeepLabv3+ neural networks. We describe improvements to the software frameworks, input pipeline, and the network training algorithms necessary to efficiently scale deep learning on the Piz Daint and Summit systems. The Tiramisu network scales to 5300 P100 GPUs with a sustained throughput of 21.0 PF/s and parallel efficiency of 79.0%. DeepLabv3+ scales up to 27360 V100 GPUs with a sustained throughput of 325.8 PF/s and a parallel efficiency of 90.7% in single precision. By taking advantage of the FP16 Tensor Cores, a half-precision version of the DeepLabv3+ network achieves a peak and sustained throughput of 1.13 EF/s and 999.0 PF/s respectively.

Original languageEnglish
Title of host publicationProceedings - International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages649-660
Number of pages12
ISBN (Electronic)9781538683842
DOIs
StatePublished - Jul 2 2018
Event2018 International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018 - Dallas, United States
Duration: Nov 11 2018Nov 16 2018

Publication series

NameProceedings - International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018

Conference

Conference2018 International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018
Country/TerritoryUnited States
CityDallas
Period11/11/1811/16/18

Funding

This research used resources of the National Energy Research Scientific Computing Center (NERSC), a DOE Office of Science User Facility supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231. This work was supported by a grant from the Swiss National Supercomputing Centre (CSCS) under Project ID g107. We thank Nicholas Cardo, Andreas Joksch, Miguel Gila and the CSCS staff for assistance in using Piz Daint. We thank Paul Tucker and Rajat Monga from Google for helpful discussions pertaining to TensorFlow. Michael Wehner, Karthik Kashinath, Burlen Loring, Travis O’Brien and Bill Collins from LBNL were instrumental in motivating the climate science problem and providing datasets. This research used the Summit system at the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. We are very grateful to OLCF staff: Veronica Melesse Vergara; Don Maxwell, and Matthew Ezell for their assistance with the runs, and Arjun Shankar; Ashley Barker; Tjerk Straatsma and Jack Wells for programmatic support.

Fingerprint

Dive into the research topics of 'Exascale deep learning for climate analytics'. Together they form a unique fingerprint.

Cite this