TY - GEN
T1 - sKokkos
T2 - 7th International Conference on High Performance Computing in Asia-Pacific Region, HPC Asia 2024
AU - Valero-Lara, Pedro
AU - Lee, Seyong
AU - Denny, Joel
AU - Teranishi, Keita
AU - Vetter, Jeffrey
AU - Gonzalez-Tallada, Marc
N1 - Publisher Copyright:
© 2024 ACM.
PY - 2024/1/18
Y1 - 2024/1/18
N2 - This paper presents a new feature to enable Kokkos with transparent device selection. For application developers, it is not easy to identify which device is the most appropriate to use in a heterogeneous system, since this depends on the characteristics of both the application and the hardware. In Kokkos, a backend is associated with one specific programming model/hardware. Programmers decide which backend to use at compilation time. This new feature implemented on the OpenACC backend eliminates the burden of deciding which device to use, providing a highly productive programming solution for Kokkos applications. This work includes implementation details and a performance study conducted with a set of mini-benchmarks (i.e., AXPY and dot product), kernels (Lattice-Bolzmann method), and two mini-apps (LULESH and miniFE) on two heterogeneous systems with different hardware capabilities. This new Kokkos feature provides high accelerations of up to 35 × thanks to automatic and transparent device selection.
AB - This paper presents a new feature to enable Kokkos with transparent device selection. For application developers, it is not easy to identify which device is the most appropriate to use in a heterogeneous system, since this depends on the characteristics of both the application and the hardware. In Kokkos, a backend is associated with one specific programming model/hardware. Programmers decide which backend to use at compilation time. This new feature implemented on the OpenACC backend eliminates the burden of deciding which device to use, providing a highly productive programming solution for Kokkos applications. This work includes implementation details and a performance study conducted with a set of mini-benchmarks (i.e., AXPY and dot product), kernels (Lattice-Bolzmann method), and two mini-apps (LULESH and miniFE) on two heterogeneous systems with different hardware capabilities. This new Kokkos feature provides high accelerations of up to 35 × thanks to automatic and transparent device selection.
KW - Auto-tuning
KW - C++ Metaprogramming
KW - CPU
KW - GPU
KW - Heterogeneous Systems
KW - Kokkos
KW - OpenACC
KW - Parallel Programming Models
UR - http://www.scopus.com/inward/record.url?scp=85184373097&partnerID=8YFLogxK
U2 - 10.1145/3635035.3635043
DO - 10.1145/3635035.3635043
M3 - Conference contribution
AN - SCOPUS:85184373097
T3 - ACM International Conference Proceeding Series
SP - 23
EP - 34
BT - Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, HPC Asia 2024
PB - Association for Computing Machinery
Y2 - 25 January 2024 through 27 January 2024
ER -