TY - GEN
T1 - A More Portable HeFFTe
T2 - 2021 IEEE High Performance Extreme Computing Conference, HPEC 2021
AU - Sharp, Daniel
AU - Stoyanov, Miroslav
AU - Tomov, Stanimire
AU - Dongarra, Jack
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021
Y1 - 2021
N2 - The Highly Efficient Fast Fourier Transform for Exascale (heFFTe) numerical library is a C++ implementation of distributed multidimensional FFTs targeting heterogeneous and scalable systems. To date, the library has relied on users to provide at least one installation from a selection of well-known libraries for the single node/MPI-rank one-dimensional FFT calculations that heFFTe is built on. In this paper, we describe the development of a CPU-based backend to heFFTe as a reference, or "stock", implementation. This allows the user to install and run heFFTe without any external dependencies that may include restrictive licensing or mandate specific hardware. Furthermore, this stock backend was implemented to take advantage of SIMD capabilities on the modern CPU, and includes both a custom vectorized complex data-Type and a run-Time generated call-graph for selecting which specific FFT algorithm to call. The performance of this backend greatly increases when vectorized instructions are available and, when vectorized, it provides reasonable scalability in both performance and accuracy compared to an alternative CPU-based FFT backend. In particular, we illustrate a highly-performant $\mathcal{O}(N\log N)$ code that is about 10× faster compared to non-vectorized code for the complex arithmetic, and a scalability that matches heFFTe's scalability when used with vendor or other highly-optimized 1D FFT backends. The same technology can be used to derive other Fourier-related transformations that may be even not available in vendor libraries, e.g., the discrete sine (DST) or cosine (DCT) transforms, as well as their extension to multiple dimensions and $\mathcal{O}(N\log N)$ timing.
AB - The Highly Efficient Fast Fourier Transform for Exascale (heFFTe) numerical library is a C++ implementation of distributed multidimensional FFTs targeting heterogeneous and scalable systems. To date, the library has relied on users to provide at least one installation from a selection of well-known libraries for the single node/MPI-rank one-dimensional FFT calculations that heFFTe is built on. In this paper, we describe the development of a CPU-based backend to heFFTe as a reference, or "stock", implementation. This allows the user to install and run heFFTe without any external dependencies that may include restrictive licensing or mandate specific hardware. Furthermore, this stock backend was implemented to take advantage of SIMD capabilities on the modern CPU, and includes both a custom vectorized complex data-Type and a run-Time generated call-graph for selecting which specific FFT algorithm to call. The performance of this backend greatly increases when vectorized instructions are available and, when vectorized, it provides reasonable scalability in both performance and accuracy compared to an alternative CPU-based FFT backend. In particular, we illustrate a highly-performant $\mathcal{O}(N\log N)$ code that is about 10× faster compared to non-vectorized code for the complex arithmetic, and a scalability that matches heFFTe's scalability when used with vendor or other highly-optimized 1D FFT backends. The same technology can be used to derive other Fourier-related transformations that may be even not available in vendor libraries, e.g., the discrete sine (DST) or cosine (DCT) transforms, as well as their extension to multiple dimensions and $\mathcal{O}(N\log N)$ timing.
UR - http://www.scopus.com/inward/record.url?scp=85123502522&partnerID=8YFLogxK
U2 - 10.1109/HPEC49654.2021.9622811
DO - 10.1109/HPEC49654.2021.9622811
M3 - Conference contribution
AN - SCOPUS:85123502522
T3 - 2021 IEEE High Performance Extreme Computing Conference, HPEC 2021
BT - 2021 IEEE High Performance Extreme Computing Conference, HPEC 2021
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 20 September 2021 through 24 September 2021
ER -