Revisiting Temporal Blocking Stencil Optimizations

  • Lingqi Zhang
  • , Mohamed Wahib
  • , Peng Chen
  • , Jintao Meng
  • , Xiao Wang
  • , Toshio Endo
  • , Satoshi Matsuoka

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

8 Scopus citations

Abstract

Iterative stencils are used widely across the spectrum of High Performance Computing (HPC) applications. Many efforts have been put into optimizing stencil GPU kernels, given the prevalence of GPU-accelerated supercomputers. To improve the data locality, temporal blocking is an optimization that combines a batch of time steps to process them together. Under the observation that GPUs are evolving to resemble CPUs in some aspects, we revisit temporal blocking optimizations for GPUs. We explore how temporal blocking schemes can be adapted to the new features in the recent Nvidia GPUs, including large scratchpad memory, hardware prefetching, and device-wide synchronization. We propose a novel temporal blocking method, EBISU, which champions low device occupancy to drive aggressive deep temporal blocking on large tiles that are executed tile-by-tile. We compare EBISU with state-of-the-art temporal blocking libraries: STENCILGEN and AN5D. We also compare with state-of-the-art stencil auto-tuning tools that are equipped with temporal blocking optimizations: ARTEMIS and DRSTENCIL. Over a wide range of stencil benchmarks, EBISU achieves speedups up to 2.53x and a geometric mean speedup of 1.49x over the best state-of-the-art performance in each stencil benchmark.

Original languageEnglish
Title of host publicationACM ICS 2023 - Proceedings of the International Conference on Supercomputing
PublisherAssociation for Computing Machinery
Pages251-263
Number of pages13
ISBN (Electronic)9798400700569
DOIs
StatePublished - Jun 21 2023
Event37th ACM International Conference on Supercomputing, ICS 2023 - Orlando, United States
Duration: Jun 21 2023Jun 23 2023

Publication series

NameProceedings of the International Conference on Supercomputing

Conference

Conference37th ACM International Conference on Supercomputing, ICS 2023
Country/TerritoryUnited States
CityOrlando
Period06/21/2306/23/23

Funding

This work was supported by JSPS KAKENHI under Grant Numbers JP22H03600 and JP21K17750. This work was supported by JST, PRESTO Grant Number JPMJPR20MA, Japan. This paper is based on results obtained from JPNP20006 project, commissioned by the New Energy and Industrial Technology Development Organization (NEDO). This manuscript has been co-authored by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the US Department of Energy (DOE). The publisher acknowledges the US government license to provide public access under the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan/). The authors wish to express their sincere appreciation to Jens Domke, Aleksandr Drozd, Emil Vatai and other RIKEN R-CCS colleagues for their invaluable advice and guidance throughout the course of this research. Finally, the first author would also like to express his gratitude to RIKEN R-CCS for offering the opportunity to undertake this research in an intern program.

Keywords

  • GPU
  • stencil
  • temporal blocking optimizations

Fingerprint

Dive into the research topics of 'Revisiting Temporal Blocking Stencil Optimizations'. Together they form a unique fingerprint.

Cite this