Abstract
Efficient processing of Irregular Matrices on Single Instruction, Multiple Data (SIMD)-type architectures is a persistent challenge. Resolving it requires innovations in the development of data formats, computational techniques, and implementations that strike a balance between thread divergence, which is inherent for Irregular Matrices, and padding, which alleviates the performance-detrimental thread divergence but introduces artificial overheads. To this end, in this article, we address the challenge of designing high performance sparse matrix-vector product (SpMV) kernels designed for Nvidia Graphics Processing Units (GPUs). We present a compressed sparse row (CSR) format suitable for unbalanced matrices. We also provide a load-balancing kernel for the coordinate (COO) matrix format and extend it to a hybrid algorithm that stores part of the matrix in SIMD-friendly Ellpack format (ELL) format. The ratio between the ELL-and the COO-part is determined using a theoretical analysis of the nonzeros-per-row distribution. For the over 2,800 test matrices available in the Suite Sparse matrix collection, we compare the performance against SpMV kernels provided by NVIDIA's cuSPARSE library and a heavily-tuned sliced ELL (SELL-P) kernel that prevents unnecessary padding by considering the irregular matrices as a combination of matrix blocks stored in ELL format.
Original language | English |
---|---|
Article number | 2 |
Journal | ACM Transactions on Parallel Computing |
Volume | 7 |
Issue number | 1 |
DOIs | |
State | Published - Mar 2020 |
Externally published | Yes |
Funding
This work was supported by the “Impuls und Vernetzungsfond” of the Helmholtz Association under grant VH-NG-1241, and the U.S. Department of Energy Office of Science, Office of Advanced Scientific Computing Research, Applied Mathematics program under Awards Number DE-SC0016513 and DE-SC-0010042. Authors’ addresses: H. Anzt, Karlsruhe Institute of Technology, Germany and University of Tennessee, USA; email: [email protected]; T. Cojean, Karlsruhe Institute of Technology, Germany; email: [email protected]; Y.-C. Chen, University of Tokyo, Japan; email: [email protected], J. Dongarra, University of Tennessee, USA and Oak Ridge National Lab, USA and University of Manchester, UK; email: [email protected]; G. Flegar, University of Jaume I, Spain; email: [email protected]; P. Nayak, Karlsruhe Institute of Technology, Germany; email: [email protected]; S. Tomov, University of Tennessee, USA; email: [email protected]; Y. M. Tsai, Karlsruhe Institute of Technology, Germany; email: [email protected]; W. Wang, National Taiwan University, Taiwan; email: [email protected]. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. © 2020 Association for Computing Machinery. 2329-4949/2020/03-ART2 $15.00 https://doi.org/10.1145/3380930
Funders | Funder number |
---|---|
U.S. Department of Energy Office of Science | |
Advanced Scientific Computing Research | DE-SC-0010042, DE-SC0016513 |
Helmholtz Association | VH-NG-1241 |
Keywords
- GPUs
- Sparse Matrix Vector Product (SpMV)
- irregular matrices