PERKS: A Locality-Optimized Execution Model for Iterative Memory-bound GPU Applications

  • Lingqi Zhang
  • , Mohamed Wahib
  • , Peng Chen
  • , Jintao Meng
  • , Xiao Wang
  • , Toshio Endo
  • , Satoshi Matsuoka

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

8 Scopus citations

Abstract

Iterative memory-bound solvers commonly occur in HPC codes. Typical GPU implementations have a loop on the host side that invokes the GPU kernel as much as time/algorithm steps there are. The termination of each kernel implicitly acts the barrier required after advancing the solution every time step. We propose an execution model for running memory-bound iterative GPU kernels: PERsistent KernelS (PERKS). In this model, the time loop is moved inside persistent kernel, and device-wide barriers are used for synchronization. We then reduce the traffic to device memory by caching subset of the output in each time step in the unused registers and shared memory. PERKS can be generalized to any iterative solver: they largely independent of the solver's implementation. We explain the design principle of PERKS and demonstrate effectiveness of PERKS for a wide range of iterative 2D/3D stencil benchmarks (geomean speedup of 2.12x for 2D stencils and 1.24x for 3D stencils over state-of-art libraries), and a Krylov subspace conjugate gradient solver (geomean speedup of 4.86x in smaller SpMV datasets from SuiteSparse and 1.43x in larger SpMV datasets over a state-of-art library). All PERKS-based implementations available at: https://github.com/neozhang307/PERKS.

Original languageEnglish
Title of host publicationACM ICS 2023 - Proceedings of the International Conference on Supercomputing
PublisherAssociation for Computing Machinery
Pages167-179
Number of pages13
ISBN (Electronic)9798400700569
DOIs
StatePublished - Jun 21 2023
Event37th ACM International Conference on Supercomputing, ICS 2023 - Orlando, United States
Duration: Jun 21 2023Jun 23 2023

Publication series

NameProceedings of the International Conference on Supercomputing

Conference

Conference37th ACM International Conference on Supercomputing, ICS 2023
Country/TerritoryUnited States
CityOrlando
Period06/21/2306/23/23

Funding

This work was supported by JSPS KAKENHI under Grant Numbers JP22H03600 and JP21K17750. This work was supported by JST, PRESTO Grant Number JPMJPR20MA, Japan. This paper is based on results obtained from JPNP20006 project, commissioned by the New Energy and Industrial Technology Development Organization (NEDO). This manuscript has been co-authored by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the US Department of Energy (DOE). The publisher acknowledges the US government license to provide public access under the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan/). The authors wish to express their sincere appreciation to Jens Domke, Aleksandr Drozd, Emil Vatai and other RIKEN R-CCS colleagues for their invaluable advice and guidance throughout the course of this research. They also wish to thank Dr. Zhao Tuowen from the SambaNova for the helpful discussions. Finally, the first author would also like to express his gratitude to RIKEN R-CCS for offering the opportunity to undertake this research in an intern program.

Keywords

  • GPU
  • iterative solvers
  • memory-bound
  • persistent kernel

Fingerprint

Dive into the research topics of 'PERKS: A Locality-Optimized Execution Model for Iterative Memory-bound GPU Applications'. Together they form a unique fingerprint.

Cite this