Computational Benefit of GPU Optimization for the Atmospheric Chemistry Modeling

Jian Sun, Joshua S. Fu, John B. Drake, Qingzhao Zhu, Azzam Haidar, Mark Gates, Stanimire Tomov, Jack Dongarra

Research output: Contribution to journalArticlepeer-review

7 Scopus citations

Abstract

Global chemistry-climate models are computationally burdened as the chemical mechanisms become more complex and realistic. Optimization for graphics processing units (GPU) may make longer global simulation with regional detail possible, but limited study has been done to explore the potential benefit for the atmospheric chemistry modeling. Hence, in this study, the second-order Rosenbrock solver of the chemistry module of CAM4-Chem is ported to the GPU to gauge potential speed-up. We find that on the CPU, the fastest performance is achieved using the Intel compiler with a block interleaved memory layout. Different combinations of compiler and memory layout lead to ~11.02× difference in the computational time. In contrast, the GPU version performs the best when using a combination of fully interleaved memory layout with block size equal to the warp size, CUDA streams for independent kernels, and constant memory. Moreover, the most efficient data transfer between CPU and GPU is gained by allocating the memory contiguously during the data initialization on the GPU. Compared to one CPU core, the speed-up of using one GPU alone reaches a factor of ~11.7× for the computation alone and ~3.82× when the data transfer between CPU and GPU is considered. Using one GPU alone is also generally faster than the multithreaded implementation for 16 CPU cores in a compute node and the single-source solution (OpenACC). The best performance is achieved by the implementation of the hybrid CPU/GPU version, but rescheduling the workload among the CPU cores is required before the practical CAM4-Chem simulation.

Original languageEnglish
Pages (from-to)1952-1969
Number of pages18
JournalJournal of Advances in Modeling Earth Systems
Volume10
Issue number8
DOIs
StatePublished - Aug 2018

Funding

The diagnostic simulations in this study use the resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy (contract DE-AC05-00OR22725). The CESM project is supported by the National Science Foundation and the Office of Science (BER) of the U.S. Department of Energy. The GPU material is also based on the work supported by the National Science Foundation under grant OAC 1740250. The authors want to thank Jean-Francois Lamarque for his support from the previous university subproject of the DOE SciDAC project “Chemistry in CESM-SE: Evaluation, Performance, and Optimization” (UCAR subaward Z12-93537 to University of Tennessee, Knoxville). The source code for the model used in this study, the CAM4-Chem, is freely available at http://www. cesm.ucar.edu/models/cesm1.2/. The CPU and GPU codes for the chemistry box model are available from the authors upon request ([email protected]).

Keywords

  • CUDA
  • GPU
  • compiler
  • data transfer
  • hybrid
  • memory layout

Fingerprint

Dive into the research topics of 'Computational Benefit of GPU Optimization for the Atmospheric Chemistry Modeling'. Together they form a unique fingerprint.

Cite this