Load-balancing Sparse Matrix Vector Product Kernels on GPUs

Hartwig Anzt, Terry Cojean, Chen Yen-Chen, Jack Dongarra, Goran Flegar, Pratik Nayak, Stanimire Tomov, Yuhsiang M. Tsai, Weichung Wang

Research output: Contribution to journalArticlepeer-review

28 Scopus citations

Abstract

Efficient processing of Irregular Matrices on Single Instruction, Multiple Data (SIMD)-type architectures is a persistent challenge. Resolving it requires innovations in the development of data formats, computational techniques, and implementations that strike a balance between thread divergence, which is inherent for Irregular Matrices, and padding, which alleviates the performance-detrimental thread divergence but introduces artificial overheads. To this end, in this article, we address the challenge of designing high performance sparse matrix-vector product (SpMV) kernels designed for Nvidia Graphics Processing Units (GPUs). We present a compressed sparse row (CSR) format suitable for unbalanced matrices. We also provide a load-balancing kernel for the coordinate (COO) matrix format and extend it to a hybrid algorithm that stores part of the matrix in SIMD-friendly Ellpack format (ELL) format. The ratio between the ELL-and the COO-part is determined using a theoretical analysis of the nonzeros-per-row distribution. For the over 2,800 test matrices available in the Suite Sparse matrix collection, we compare the performance against SpMV kernels provided by NVIDIA's cuSPARSE library and a heavily-tuned sliced ELL (SELL-P) kernel that prevents unnecessary padding by considering the irregular matrices as a combination of matrix blocks stored in ELL format.

Original languageEnglish
Article number2
JournalACM Transactions on Parallel Computing
Volume7
Issue number1
DOIs
StatePublished - Mar 2020
Externally publishedYes

Funding

This work was supported by the “Impuls und Vernetzungsfond” of the Helmholtz Association under grant VH-NG-1241, and the U.S. Department of Energy Office of Science, Office of Advanced Scientific Computing Research, Applied Mathematics program under Awards Number DE-SC0016513 and DE-SC-0010042. Authors’ addresses: H. Anzt, Karlsruhe Institute of Technology, Germany and University of Tennessee, USA; email: [email protected]; T. Cojean, Karlsruhe Institute of Technology, Germany; email: [email protected]; Y.-C. Chen, University of Tokyo, Japan; email: [email protected], J. Dongarra, University of Tennessee, USA and Oak Ridge National Lab, USA and University of Manchester, UK; email: [email protected]; G. Flegar, University of Jaume I, Spain; email: [email protected]; P. Nayak, Karlsruhe Institute of Technology, Germany; email: [email protected]; S. Tomov, University of Tennessee, USA; email: [email protected]; Y. M. Tsai, Karlsruhe Institute of Technology, Germany; email: [email protected]; W. Wang, National Taiwan University, Taiwan; email: [email protected]. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. © 2020 Association for Computing Machinery. 2329-4949/2020/03-ART2 $15.00 https://doi.org/10.1145/3380930

FundersFunder number
U.S. Department of Energy Office of Science
Advanced Scientific Computing ResearchDE-SC-0010042, DE-SC0016513
Helmholtz AssociationVH-NG-1241

    Keywords

    • GPUs
    • Sparse Matrix Vector Product (SpMV)
    • irregular matrices

    Fingerprint

    Dive into the research topics of 'Load-balancing Sparse Matrix Vector Product Kernels on GPUs'. Together they form a unique fingerprint.

    Cite this