Abstract
In this work, we analyze the implications and results of implementing dynamic parallelism, concurrent kernels and CUDA Graphs to solve task-oriented problems. As a benchmark we propose three different methods for solving DGEMM operation on tiled-matrices; which might be the most popular benchmark for performance analysis. For the algorithms that we study, we present significant differences in terms of data dependencies, synchronization and granularity. The main contribution of this work is determining which of the previous approaches work better for having multiple task running concurrently in a single GPU, as well as stating the main limitations and benefits of every technique. Using dynamic parallelism and CUDA Streams we were able to achieve up to 30% speedups and for CUDA Graph API up to 25x acceleration outperforming state of the art results.
| Original language | English |
|---|---|
| Title of host publication | Proceedings - 2019 20th International Conference on Parallel and Distributed Computing, Applications and Technologies, PDCAT 2019 |
| Editors | Hui Tian, Hong Shen, Wee Lum Tan |
| Publisher | Institute of Electrical and Electronics Engineers Inc. |
| Pages | 127-132 |
| Number of pages | 6 |
| ISBN (Electronic) | 9781728126166 |
| DOIs | |
| State | Published - Dec 2019 |
| Externally published | Yes |
| Event | 20th International Conference on Parallel and Distributed Computing, Applications and Technologies, PDCAT 2019 - Gold Coast, Australia Duration: Dec 5 2019 → Dec 7 2019 |
Publication series
| Name | Proceedings - 2019 20th International Conference on Parallel and Distributed Computing, Applications and Technologies, PDCAT 2019 |
|---|
Conference
| Conference | 20th International Conference on Parallel and Distributed Computing, Applications and Technologies, PDCAT 2019 |
|---|---|
| Country/Territory | Australia |
| City | Gold Coast |
| Period | 12/5/19 → 12/7/19 |
Funding
This project has received funding from the EPEEC project from the European Union's Horizon 2020 research and innovation programme under grant agreement No 801051, from the Spanish Ministry of Economy and Competitiveness under the project Computacion de Altas Prestaciones VII (TIN2015-65316-P ) and the Departament d'Innovacio, Universitats i Empresa de la Generalitat de Catalunya , under project MPEXPAR: Models de Pro-gramacio i Entorns d'Execucio Parallels (2014-SGR-1051). Finally, this project also received funding from the Spanish Ministry of Economy and Competitiveness under the Juan de la Cierva Grant Agreement No IJCI-2017-33511 , and from the European Union's Horizon 2020 research and innovation program under the Marie Sklodowska Curie grant agreement No. 749516 This project has received funding from the EPEEC project from the European Union’s Horizon 2020 research and inno-vationprogramme under grant agreement No 801051, from the Spanish Ministry of Economy and Competitvieness under the project Computación de Altas Prestaciones VII ( TIN2015-65316-P ) and the Departament d’Innovació, Universitats i Empresa de la Generalitat de Catalunya , under project MPEX-PAR:Models de Pro-gramació i Entorns d’Execució Paral · lels ( 2014-SGR-1051 ). Finally, this project also received funding from the Spanish Ministry of Economy and Competitvieness under the Juan de la Cierva Grant Agreement No IJCI-2017-33511 , and from the European Union’s Horizon 2020 research and innovation program under the Marie Sklodowska Curie grant agreement No. 749516 .
Keywords
- CUDA
- Cuda graph
- Cuda stream
- Dynamic parallelism
- GPU