Abstract
We propose a variety of batched routines for concurrently processing a large collection of small-size, independent sparse matrixvector products (SpMV) on graphics processing units (GPUs). These batched SpMV kernels are designed to be flexible in order to handle a batch of matrices which differ in size, nonzero count, and nonzero distribution. Furthermore, they support three most commonly used sparse storage formats: CSR, COO and ELL. Our experimental results on a state-of-the-art GPU reveal performance improvements of up to 25 compared to non-batched SpMV routines.
Original language | English |
---|---|
Title of host publication | Proceedings of ScalA 2017 |
Subtitle of host publication | 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems - Held in conjunction with SC 2017: The International Conference for High Performance Computing, Networking, Storage and Analysis |
Publisher | Association for Computing Machinery, Inc |
ISBN (Print) | 9781450351256 |
DOIs | |
State | Published - Nov 12 2017 |
Event | 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, ScalA 2017 - Held in conjunction with the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2017 - Denver, United States Duration: Nov 12 2017 → Nov 17 2017 |
Publication series
Name | Proceedings of ScalA 2017: 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems - Held in conjunction with SC 2017: The International Conference for High Performance Computing, Networking, Storage and Analysis |
---|
Conference
Conference | 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, ScalA 2017 - Held in conjunction with the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2017 |
---|---|
Country/Territory | United States |
City | Denver |
Period | 11/12/17 → 11/17/17 |
Funding
This work was partly funded by the U.S. Department of Energy Office of Science, Office of Advanced Scientific Computing Research, Applied Mathematics program under Award Number DESC-0010042. H. Anzt was supported by the “Impuls und Vernet-zungsfond” of the Helmholtz Association under grant VH-NG-1241. G. Flegar and E. S. Quintana-Ortí were supported by projects TIN2014-53495-R of the Spanish Ministerio de Economía y Competi-tividad and the EU H2020 project 732631 OPRECOMP.
Keywords
- Batched routines
- GPUs
- Sparse matrix-vector product