TY - GEN
T1 - Accelerating Multi-Process Communication for Parallel 3-D FFT
AU - Ayala, Alan
AU - Tomov, Stan
AU - Stoyanov, Miroslav
AU - Haidar, Azzam
AU - Dongarra, Jack
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021
Y1 - 2021
N2 - Today largest and most powerful supercomputers in the world are built on heterogeneous platforms; and using the combined power of multi-core CPUs and GPUs, has had a great impact accelerating large-scale applications. However, on these architectures, parallel algorithms, such as the Fast Fourier Transform (FFT), encounter that inter-processor communication become a bottleneck and limits their scalability. In this paper, we present techniques for speeding up multi-process communication cost during the computation of FFTs, considering hybrid network connections as those expected on upcoming exascale machines. Among our techniques, we present algorithmic tuning, making use of phase diagrams; parametric tuning, using different FFT settings; and MPI distribution tuning based on FFT size and computational resources available. We present several experiments obtained on Summit supercomputer at Oak Ridge National Laboratory, using up to 40,960 IBM Power9 cores and 6,144 NVIDIA V-100 GPUs.
AB - Today largest and most powerful supercomputers in the world are built on heterogeneous platforms; and using the combined power of multi-core CPUs and GPUs, has had a great impact accelerating large-scale applications. However, on these architectures, parallel algorithms, such as the Fast Fourier Transform (FFT), encounter that inter-processor communication become a bottleneck and limits their scalability. In this paper, we present techniques for speeding up multi-process communication cost during the computation of FFTs, considering hybrid network connections as those expected on upcoming exascale machines. Among our techniques, we present algorithmic tuning, making use of phase diagrams; parametric tuning, using different FFT settings; and MPI distribution tuning based on FFT size and computational resources available. We present several experiments obtained on Summit supercomputer at Oak Ridge National Laboratory, using up to 40,960 IBM Power9 cores and 6,144 NVIDIA V-100 GPUs.
KW - Exascale FFT
KW - Hybrid systems
KW - MPI tuning
KW - Scalability
UR - http://www.scopus.com/inward/record.url?scp=85124654321&partnerID=8YFLogxK
U2 - 10.1109/ExaMPI54564.2021.00011
DO - 10.1109/ExaMPI54564.2021.00011
M3 - Conference contribution
AN - SCOPUS:85124654321
T3 - Proceedings of ExaMPI 2021: Workshop on Exascale MPI, Held in conjunction with SC 2021: The International Conference for High Performance Computing, Networking, Storage and Analysis
SP - 46
EP - 53
BT - Proceedings of ExaMPI 2021
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2021 Workshop on Exascale MPI, ExaMPI 2021
Y2 - 14 November 2021
ER -