Abstract
In this work, we analyze the implications and results of implementing dynamic parallelism, concurrent kernels and CUDA Graphs to solve task-oriented problems. As a benchmark we propose three different methods for solving DGEMM operation on tiled-matrices; which might be the most popular benchmark for performance analysis. For the algorithms that we study, we present significant differences in terms of data dependencies, synchronization and granularity. The main contribution of this work is determining which of the previous approaches work better for having multiple task running concurrently in a single GPU, as well as stating the main limitations and benefits of every technique. Using dynamic parallelism and CUDA Streams we were able to achieve up to 30% speedups and for CUDA Graph API up to 25x acceleration outperforming state of the art results.
Original language | English |
---|---|
Title of host publication | Proceedings - 2019 20th International Conference on Parallel and Distributed Computing, Applications and Technologies, PDCAT 2019 |
Editors | Hui Tian, Hong Shen, Wee Lum Tan |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 127-132 |
Number of pages | 6 |
ISBN (Electronic) | 9781728126166 |
DOIs | |
State | Published - Dec 2019 |
Externally published | Yes |
Event | 20th International Conference on Parallel and Distributed Computing, Applications and Technologies, PDCAT 2019 - Gold Coast, Australia Duration: Dec 5 2019 → Dec 7 2019 |
Publication series
Name | Proceedings - 2019 20th International Conference on Parallel and Distributed Computing, Applications and Technologies, PDCAT 2019 |
---|
Conference
Conference | 20th International Conference on Parallel and Distributed Computing, Applications and Technologies, PDCAT 2019 |
---|---|
Country/Territory | Australia |
City | Gold Coast |
Period | 12/5/19 → 12/7/19 |
Funding
This project has received funding from the EPEEC project from the European Union's Horizon 2020 research and innovation programme under grant agreement No 801051, from the Spanish Ministry of Economy and Competitiveness under the project Computacion de Altas Prestaciones VII (TIN2015-65316-P ) and the Departament d'Innovacio, Universitats i Empresa de la Generalitat de Catalunya , under project MPEXPAR: Models de Pro-gramacio i Entorns d'Execucio Parallels (2014-SGR-1051). Finally, this project also received funding from the Spanish Ministry of Economy and Competitiveness under the Juan de la Cierva Grant Agreement No IJCI-2017-33511 , and from the European Union's Horizon 2020 research and innovation program under the Marie Sklodowska Curie grant agreement No. 749516 This project has received funding from the EPEEC project from the European Union’s Horizon 2020 research and inno-vationprogramme under grant agreement No 801051, from the Spanish Ministry of Economy and Competitvieness under the project Computación de Altas Prestaciones VII ( TIN2015-65316-P ) and the Departament d’Innovació, Universitats i Empresa de la Generalitat de Catalunya , under project MPEX-PAR:Models de Pro-gramació i Entorns d’Execució Paral · lels ( 2014-SGR-1051 ). Finally, this project also received funding from the Spanish Ministry of Economy and Competitvieness under the Juan de la Cierva Grant Agreement No IJCI-2017-33511 , and from the European Union’s Horizon 2020 research and innovation program under the Marie Sklodowska Curie grant agreement No. 749516 .
Funders | Funder number |
---|---|
Computacion de Altas Prestaciones VII | |
Computación de Altas Prestaciones VII | TIN2015-65316-P |
European Union’s Horizon 2020 Research and Innovation Program | |
Spanish Ministry of Economy and Competitvieness | |
Horizon 2020 Framework Programme | |
Generalitat de Catalunya | IJCI-2017-33511, 2014-SGR-1051 |
Departament d'Innovació, Universitats i Empresa, Generalitat de Catalunya | |
Ministerio de Economía y Competitividad | |
Horizon 2020 | 801051, 749516 |
Keywords
- CUDA
- Cuda graph
- Cuda stream
- Dynamic parallelism
- GPU