TY - GEN
T1 - sKokkos
T2 - 5th International Conference on Big-data Service and Intelligent Computation
AU - Valero-Lara, Pedro
AU - Lee, Seyong
AU - Denny, Joel
AU - Teranishi, Keita
AU - Vetter, Jeffrey S.
AU - Gonzalez-Tallada, Marc
N1 - Publisher Copyright:
© 2024 Copyright held by the owner/author(s). Publication rights licensed to ACM.
PY - 2023/10/20
Y1 - 2023/10/20
N2 - This paper presents a new feature to enable Kokkos with transparent device selection. For application developers, it is not easy to identify which device is the most appropriate to use in a heterogeneous system, since this depends on the characteristics of both the application and the hardware. In Kokkos, a backend is associated with one specific programming model/hardware. Programmers decide which backend to use at compilation time. This new feature implemented on the OpenACC backend eliminates the burden of deciding which device to use, providing a highly productive programming solution for Kokkos applications. This work includes implementation details and a performance study conducted with a set of mini-benchmarks (i.e., AXPY and dot product), kernels (Lattice-Bolzmann method), and two mini-apps (LULESH and miniFE) on two heterogeneous systems with different hardware capabilities. This new Kokkos feature provides high accelerations of up to 35× thanks to automatic and transparent device selection.
AB - This paper presents a new feature to enable Kokkos with transparent device selection. For application developers, it is not easy to identify which device is the most appropriate to use in a heterogeneous system, since this depends on the characteristics of both the application and the hardware. In Kokkos, a backend is associated with one specific programming model/hardware. Programmers decide which backend to use at compilation time. This new feature implemented on the OpenACC backend eliminates the burden of deciding which device to use, providing a highly productive programming solution for Kokkos applications. This work includes implementation details and a performance study conducted with a set of mini-benchmarks (i.e., AXPY and dot product), kernels (Lattice-Bolzmann method), and two mini-apps (LULESH and miniFE) on two heterogeneous systems with different hardware capabilities. This new Kokkos feature provides high accelerations of up to 35× thanks to automatic and transparent device selection.
KW - Auto-tuning
KW - C++ Metaprogramming
KW - CPU
KW - GPU
KW - Heterogeneous Systems
KW - Kokkos
KW - OpenACC
KW - Parallel Programming Models
UR - http://www.scopus.com/inward/record.url?scp=85184522958&partnerID=8YFLogxK
U2 - 10.1145/3635035.3635043
DO - 10.1145/3635035.3635043
M3 - Conference contribution
AN - SCOPUS:85184522958
T3 - ACM International Conference Proceeding Series
SP - 23
EP - 34
BT - BDSIC2023 - 2023 5th International Conference on Big-data Service and Intelligent Computation
PB - Association for Computing Machinery
Y2 - 20 October 2023 through 22 October 2023
ER -