TY - GEN
T1 - Evaluating the Performance and Portability of Contemporary SYCL Implementations
AU - Johnston, Beau
AU - Vetter, Jeffrey S.
AU - Milthorpe, Josh
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/11
Y1 - 2020/11
N2 - SYCL is a single-source programming model for heterogeneous systems; it promises improved maintainability, productivity, and opportunity for compiler optimization, when compared to accelerator specific programming models. Several implementations of the SYCL standard have been developed over the past few years, including several backends using contemporary accelerator languages, like OpenCL, CUDA, and HIP. These implementations vary widely in their support for specific features of the standard and in their performance. As SYCL grows in popularity, developers need to know how features are implemented across popular implementations in order to make proper design choices. In this paper, we evaluate the existing SYCL implementations for important SYCL features across a range of hardware in order to understand SYCL's performance and portability. This work uses the newest SYCL benchmark suite (SYCL-Bench, 38 kernels) to evaluate these four existing implementations, comparing support of language features across backends and highlighting feature completeness and performance. For features, we focus on the five major SYCL parallel constructs, using a motivating example of the matrix multiplication benchmark. Our results show that the basic data parallelism construct is the best choice for performance on current SYCL implementations, and we identify opportunities for improvement in several of the SYCL implementations.
AB - SYCL is a single-source programming model for heterogeneous systems; it promises improved maintainability, productivity, and opportunity for compiler optimization, when compared to accelerator specific programming models. Several implementations of the SYCL standard have been developed over the past few years, including several backends using contemporary accelerator languages, like OpenCL, CUDA, and HIP. These implementations vary widely in their support for specific features of the standard and in their performance. As SYCL grows in popularity, developers need to know how features are implemented across popular implementations in order to make proper design choices. In this paper, we evaluate the existing SYCL implementations for important SYCL features across a range of hardware in order to understand SYCL's performance and portability. This work uses the newest SYCL benchmark suite (SYCL-Bench, 38 kernels) to evaluate these four existing implementations, comparing support of language features across backends and highlighting feature completeness and performance. For features, we focus on the five major SYCL parallel constructs, using a motivating example of the matrix multiplication benchmark. Our results show that the basic data parallelism construct is the best choice for performance on current SYCL implementations, and we identify opportunities for improvement in several of the SYCL implementations.
UR - http://www.scopus.com/inward/record.url?scp=85099687605&partnerID=8YFLogxK
U2 - 10.1109/P3HPC51967.2020.00010
DO - 10.1109/P3HPC51967.2020.00010
M3 - Conference contribution
AN - SCOPUS:85099687605
T3 - Proceedings of P3HPC 2020: International Workshop on Performance, Portability, and Productivity in HPC, Held in conjunction with SC 2020: The International Conference for High Performance Computing, Networking, Storage and Analysis
SP - 45
EP - 56
BT - Proceedings of P3HPC 2020
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2020 IEEE/ACM International Workshop on Performance, Portability, and Productivity in HPC, P3HPC 2020
Y2 - 13 November 2020
ER -