TY - GEN
T1 - BCSR on GPU
T2 - 2024 Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC Workshops 2024
AU - Sattar, Naw Safrin
AU - Lu, Hao
AU - Wang, Feiyi
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Handling large graphs in a distributed environment requires effective partitioning across processors and efficient management of local partitions. In 2D partitioning, local graphs often become too sparse, making memory-efficient data structures crucial. Using the Compressed Sparse Row (CSR) format wastes space, especially for > 83% of vertices with empty edges for the sparse graphs. This study explores bit-CSR (BCSR), a modified CSR representation, on GPUs to reduce memory usage in graph computations. We achieved 16.67% memory savings on a sparse rmat dataset with 268 million vertices and 357 million edges, without performance degradation, supported by both theoretical and experimental storage savings of 33%. However, we observed a 1.7× slowdown in degree lookup times due to bitwise operations on AMD CPUs. This analysis highlights the potential of BCSR on GPUs for improving Graph500 benchmark performance on GPU-accelerated systems, such as the Frontier supercomputer.
AB - Handling large graphs in a distributed environment requires effective partitioning across processors and efficient management of local partitions. In 2D partitioning, local graphs often become too sparse, making memory-efficient data structures crucial. Using the Compressed Sparse Row (CSR) format wastes space, especially for > 83% of vertices with empty edges for the sparse graphs. This study explores bit-CSR (BCSR), a modified CSR representation, on GPUs to reduce memory usage in graph computations. We achieved 16.67% memory savings on a sparse rmat dataset with 268 million vertices and 357 million edges, without performance degradation, supported by both theoretical and experimental storage savings of 33%. However, we observed a 1.7× slowdown in degree lookup times due to bitwise operations on AMD CPUs. This analysis highlights the potential of BCSR on GPUs for improving Graph500 benchmark performance on GPU-accelerated systems, such as the Frontier supercomputer.
KW - 2D Partitioning
KW - AMD GPU
KW - Breadth First Search (BFS)
KW - Compressed Sparse Row (CSR) Graph
KW - HIP
KW - Large-scale Graph
UR - http://www.scopus.com/inward/record.url?scp=85217154310&partnerID=8YFLogxK
U2 - 10.1109/SCW63240.2024.00044
DO - 10.1109/SCW63240.2024.00044
M3 - Conference contribution
AN - SCOPUS:85217154310
T3 - Proceedings of SC 2024-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis
SP - 280
EP - 289
BT - Proceedings of SC 2024-W
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 17 November 2024 through 22 November 2024
ER -