Abstract
SYCL is a portable programming model for heterogeneous computing, so it is important to obtain reasonable performance portability of SYCL. Towards the goal of better understanding and improving performance portability of SYCL for machine learning workloads, we have been developing benchmarks for basic operators in deep neural networks (DNNs). These operators could be offloaded to heterogeneous computing devices such as graphics processing units (GPUs) to speed up computation. In this paper, we introduce the benchmarks, evaluate the performance of the operators on GPU-based systems, and describe the causes of the performance gap between the SYCL and Compute Unified Device Architecture (CUDA) kernels. We find that the causes are related to the utilization of the texture cache for read-only data, optimization of the memory accesses with strength reduction, use of local memory, and register usage per thread. We hope that the efforts of developing benchmarks for studying performance portability will stimulate discussion and interactions within the community.
| Original language | English |
|---|---|
| Title of host publication | Languages and Compilers for Parallel Computing - 36th International Workshop, LCPC 2023, Revised Selected Papers |
| Editors | Henry Dietz |
| Publisher | Springer Science and Business Media Deutschland GmbH |
| Pages | 33-45 |
| Number of pages | 13 |
| ISBN (Print) | 9783032024350 |
| DOIs | |
| State | Published - 2026 |
| Event | 36th International Workshop on Languages and Compilers for Parallel Computing, LCPC 2023 - Lexington, United States Duration: Oct 11 2023 → Oct 13 2023 |
Publication series
| Name | Lecture Notes in Computer Science |
|---|---|
| Volume | 14480 LNCS |
| ISSN (Print) | 0302-9743 |
| ISSN (Electronic) | 1611-3349 |
Conference
| Conference | 36th International Workshop on Languages and Compilers for Parallel Computing, LCPC 2023 |
|---|---|
| Country/Territory | United States |
| City | Lexington |
| Period | 10/11/23 → 10/13/23 |
Funding
We appreciate the reviewers’ comments and suggestions. This research used resources of the Experimental Computing Laboratory at the Oak Ridge National Laboratory. This manuscript has been authored by UT-Battelle LLC under contract no. DE-AC05-00OR22725 with the US Department of Energy. The publisher, by accepting the article for publication, acknowledges that the US government retains a non-exclusive, paid up, irrevocable, world-wide license to publish or reproduce the published form of the manuscript, or allow others to do so, for US government purposes. The DOE will provide public access to these results in accordance with the DOE Public Access Plan.
Keywords
- Benchmarks
- DNN operators
- Performance Portability