Abstract
Many domains of scientific simulation (chemistry, condensed matter physics, data science) increasingly eschew dense tensors for block-sparse tensors, sometimes with additional structure (recursive hierarchy, rank sparsity, etc.). Distributed-memory parallel computation with block-sparse tensorial data is paramount to minimize the time-to-solution (e.g., to study dynamical problems or for real-time analysis) and to accommodate problems of realistic size that are too large to fit into the host/device memory of a single node equipped with accelerators. Unfortunately, computation with such irregular data structures is a poor match to the dominant imperative, bulk-synchronous parallel programming model. In this paper, we focus on the critical element of block-sparse tensor algebra, namely binary tensor contraction, and report on an efficient and scalable implementation using the task-focused PaRSEC runtime. High performance of the block-sparse tensor contraction on the Summit supercomputer is demonstrated for synthetic data as well as for real data involved in electronic structure simulations of unprecedented size.
Original language | English |
---|---|
Title of host publication | Proceedings - 2021 IEEE 35th International Parallel and Distributed Processing Symposium, IPDPS 2021 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 537-546 |
Number of pages | 10 |
ISBN (Electronic) | 9781665440660 |
DOIs | |
State | Published - May 2021 |
Externally published | Yes |
Event | 35th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2021 - Virtual, Online Duration: May 17 2021 → May 21 2021 |
Publication series
Name | Proceedings - 2021 IEEE 35th International Parallel and Distributed Processing Symposium, IPDPS 2021 |
---|
Conference
Conference | 35th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2021 |
---|---|
City | Virtual, Online |
Period | 05/17/21 → 05/21/21 |
Funding
This research was supported by the Exascale Computing Project (17-SC-20-SC), and the NSF projects #1931347, #1931384, and #1931387; it used resources of the Oak Ridge Leadership Computing Facility at ORNL, which is supported by the U.S. D.o.E. under Contract No. DE-AC05-00OR22725.
Keywords
- Block-sparse matrix multiplication
- Distributed memory
- Electronic structure
- Multi-GPU nodes
- PaRSEC
- Tensor contraction