Integrating batched sparse iterative solvers for the collision operator in fusion plasma simulations on GPUs

Aditya Kashi, Pratik Nayak, Dhruva Kulkarni, Aaron Scheinberg, Paul Lin, Hartwig Anzt

Research output: Contribution to journalArticlepeer-review

2 Scopus citations

Abstract

Batched linear solvers, which solve many small related but independent problems, are increasingly important for highly parallel processors such as graphics processing units (GPUs). GPUs need a substantial amount of work to keep them operating efficiently and it is not an option to solve smaller problems one-by-one. Because of the small size of each problem, the task of implementing a parallel partitioning scheme and mapping the problem to hardware is not trivial. In recent history, significant attention has been given to batched dense linear algebra. However, there is also an interest in utilizing sparse iterative solvers in a batched form. An example use case is found in a gyrokinetic Particle-In-Cell (PIC) code used for modeling magnetically confined fusion plasma devices. The collision operator has been identified as a bottleneck, and a proxy app has been created for facilitating optimizations and porting to GPUs. The current collision kernel linear solver does not run on the GPU—a major bottleneck. As these matrices are sparse and well-conditioned, batched iterative sparse solvers are an attractive option. A batched sparse iterative solver capability has recently been developed in the GINKGO library. In this paper, we describe how GINKGO's batched solver technology can integrate into the XGC collision kernel and accelerate the simulation process. Comparisons for the solve times on NVIDIA V100 and A100 GPUs and AMD MI100 GPUs with one dual-socket Intel Xeon Skylake CPU node with 40 cores are presented for matrices from the collision kernel of XGC. Further, the speedups observed for the overall collision kernel are presented in comparison to different modern CPUs on multiple supercomputer systems. The results suggest that GINKGO's batched sparse iterative solvers are well suited for efficient utilization of the GPU for this problem, and the performance portability of GINKGO in conjunction with Kokkos (used within XGC as the heterogeneous programming model) allows seamless execution on exascale-oriented heterogeneous architectures.

Original languageEnglish
Pages (from-to)69-81
Number of pages13
JournalJournal of Parallel and Distributed Computing
Volume178
DOIs
StatePublished - Aug 2023

Funding

This research was supported by the Exascale Computing Project ( 17-SC-20-SC ), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration. It used resources of the Oak Ridge Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC05-00OR22725 . Some work in this paper was also performed on the HoreKa supercomputer funded by the Ministry of Science, Research and the Arts Baden-Württemberg and by the Federal Ministry of Education and Research , Germany. This research used resources of the National Energy Research Scientific Computing Center (NERSC), a U.S. Department of Energy Office of Science User Facility located at Lawrence Berkeley National Laboratory, operated under Contract No. DE-AC02-05CH11231 .

FundersFunder number
Office of ScienceDE-AC05-00OR22725
National Nuclear Security Administration
Lawrence Berkeley National LaboratoryDE-AC02-05CH11231
Bundesministerium für Bildung und Forschung
Ministerium für Wissenschaft, Forschung und Kunst Baden-Württemberg

    Keywords

    • Batched solvers
    • GPU
    • Performance portability
    • Plasma simulation
    • Sparse linear systems

    Fingerprint

    Dive into the research topics of 'Integrating batched sparse iterative solvers for the collision operator in fusion plasma simulations on GPUs'. Together they form a unique fingerprint.

    Cite this