Abstract
This paper presents a unified communication optimization frame-work for sparse triangular solve (SpTRSV) algorithms on CPU and GPU clusters. The framework builds upon a 3D communication-avoiding (CA) layout of Px× Py× Pz processes that divides a sparse matrix into Pz submatrices, each handled by a Px× Py2D grid with block-cyclic distribution. We propose three communication optimization strategies: First, a new 3D SpTRSV algorithm is developed, which trades the inter-grid communication and synchronization with replicated computation. This design requires only one inter-grid synchronization, and the inter-grid communication is efficiently implemented with sparse allreduce operations. Second, broadcast and reduction communication trees are used to reduce message latency of the intra-grid 2D communication on CPU clus-ters. Finally, we leverage GPU-initiated one-sided communication to implement the communication trees on GPU clusters. With these nested inter- and intra-grid communication optimization strategies, the proposed 3D SpTRSV algorithm can attain up to 3.45x speedups compared to the baseline 3D SpTRSV algorithm using up to 2048 Cori Haswell CPU cores. In addition, the proposed GPU 3D Sp-TRSV algorithm can achieve up to 6.5x speedups compared to the proposed CPU 3D SpTRSV algorithm with Pz up to 64. Finally it is remarkable that the proposed GPU 3D SpTRSV can scale to 256 GPUs using the Perlmutter system while the existing 2D SpTRSV algorithm can only scale up to 4 GPUs.
| Original language | English |
|---|---|
| Title of host publication | SC 2023 - International Conference for High Performance Computing, Networking, Storage and Analysis |
| Publisher | IEEE Computer Society |
| ISBN (Electronic) | 9798400701092 |
| DOIs | |
| State | Published - 2023 |
| Event | 2023 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2023 - Denver, United States Duration: Nov 12 2023 → Nov 17 2023 |
Publication series
| Name | International Conference for High Performance Computing, Networking, Storage and Analysis, SC |
|---|---|
| ISSN (Print) | 2167-4329 |
| ISSN (Electronic) | 2167-4337 |
Conference
| Conference | 2023 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2023 |
|---|---|
| Country/Territory | United States |
| City | Denver |
| Period | 11/12/23 → 11/17/23 |
Funding
This research was supported in part by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration, and in part by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, Scientific Discovery through Advanced Computing (SciDAC) program through the FASTMath Institute under Contract No. DE-AC02-05CH11231 at Lawrence Berkeley National Laboratory. This research used resources of the National Energy Research Scientific Computing Center (NERSC), a U.S. Department of Energy Office of Science User Facility operated under Contract No. DE-AC02-05CH11231. This research also used resources of the Oak Ridge Leadership Facility which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.
Keywords
- NVSH-MEM
- SpTRSV
- communication optimization
- communication-avoiding algorithm
- sparse matrix
- supernodal method
- triangular solve