Batched sparse iterative solvers on GPU for the collision operator for fusion plasma simulations

Aditya Kashi, Pratik Nayak, Dhruva Kulkarni, Aaron Scheinberg, Paul Lin, Hartwig Anzt

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

8 Scopus citations

Abstract

Batched linear solvers, which solve many small related but independent problems, are important in several applications. This is increasingly the case for highly parallel processors such as graphics processing units (GPUs), which need a substantial amount of work to keep them operating efficiently and solving smaller problems one-by-one is not an option. Because of the small size of each problem, the task of coming up with a parallel partitioning scheme and mapping the problem to hardware is not trivial. In recent history, significant attention has been given to batched dense linear algebra. However, there is also an interest in utilizing sparse iterative solvers in a batched form, and this presents further challenges. An example use case is found in a gyrokinetic Particle-In-Cell (PIC) code used for modeling magnetically confined fusion plasma devices. The collision operator has been identified as a bottleneck, and a proxy app has been created for facilitating optimizations and porting to GPUs. The current collision kernel linear solver does not run on the GPU-a major bottleneck. As these matrices are well-conditioned, batched iterative sparse solvers are an attractive option. A batched sparse iterative solver capability has recently been developed in the Ginkgo library. In this paper, we describe how the software architecture can be used to develop an efficient solution for the XGC collision proxy app. Comparisons for the solve times on NVIDIA V100 and A100 GPUs and AMD MI100 GPUs with one dual-socket Intel Xeon Skylake CPU node with 40 OpenMP threads are presented for matrices representative of those required in the collision kernel of XGC. The results suggest that GINKGO's batched sparse iterative solvers are well suited for efficient utilization of the GPU for this problem, and the performance portability of Ginkgo in conjunction with Kokkos (used within XGC as the heterogeneous programming model) allows seamless execution for exascale oriented heterogeneous architectures at the various leadership supercomputing facilities.

Original languageEnglish
Title of host publicationProceedings - 2022 IEEE 36th International Parallel and Distributed Processing Symposium, IPDPS 2022
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages157-167
Number of pages11
ISBN (Electronic)9781665481069
DOIs
StatePublished - 2022
Externally publishedYes
Event36th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2022 - Virtual, Online, France
Duration: May 30 2022Jun 3 2022

Publication series

NameProceedings - 2022 IEEE 36th International Parallel and Distributed Processing Symposium, IPDPS 2022

Conference

Conference36th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2022
Country/TerritoryFrance
CityVirtual, Online
Period05/30/2206/3/22

Funding

This research was supported by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration. Some work in this paper was also performed on the HoreKa supercomputer funded by the Ministry of Science, Research and the Arts Baden-Württemberg and by the Federal Ministry of Education and Research, Germany.

FundersFunder number
Office of Science
National Nuclear Security Administration
Bundesministerium für Bildung und Forschung
Ministerium für Wissenschaft, Forschung und Kunst Baden-Württemberg

    Keywords

    • GPU
    • Ginkgo
    • Iter
    • Sparse linear systems
    • WDMApp
    • Xgc
    • batched solvers
    • fusion
    • performance portability
    • simulation

    Fingerprint

    Dive into the research topics of 'Batched sparse iterative solvers on GPU for the collision operator for fusion plasma simulations'. Together they form a unique fingerprint.

    Cite this