Tasking in accelerators: Performance evaluation

Leonel Toledo, Antonio J. Pena, Sandra Catalan, Pedro Valero-Lara

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

7 Scopus citations

Abstract

In this work, we analyze the implications and results of implementing dynamic parallelism, concurrent kernels and CUDA Graphs to solve task-oriented problems. As a benchmark we propose three different methods for solving DGEMM operation on tiled-matrices; which might be the most popular benchmark for performance analysis. For the algorithms that we study, we present significant differences in terms of data dependencies, synchronization and granularity. The main contribution of this work is determining which of the previous approaches work better for having multiple task running concurrently in a single GPU, as well as stating the main limitations and benefits of every technique. Using dynamic parallelism and CUDA Streams we were able to achieve up to 30% speedups and for CUDA Graph API up to 25x acceleration outperforming state of the art results.

Original languageEnglish
Title of host publicationProceedings - 2019 20th International Conference on Parallel and Distributed Computing, Applications and Technologies, PDCAT 2019
EditorsHui Tian, Hong Shen, Wee Lum Tan
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages127-132
Number of pages6
ISBN (Electronic)9781728126166
DOIs
StatePublished - Dec 2019
Externally publishedYes
Event20th International Conference on Parallel and Distributed Computing, Applications and Technologies, PDCAT 2019 - Gold Coast, Australia
Duration: Dec 5 2019Dec 7 2019

Publication series

NameProceedings - 2019 20th International Conference on Parallel and Distributed Computing, Applications and Technologies, PDCAT 2019

Conference

Conference20th International Conference on Parallel and Distributed Computing, Applications and Technologies, PDCAT 2019
Country/TerritoryAustralia
CityGold Coast
Period12/5/1912/7/19

Funding

This project has received funding from the EPEEC project from the European Union's Horizon 2020 research and innovation programme under grant agreement No 801051, from the Spanish Ministry of Economy and Competitiveness under the project Computacion de Altas Prestaciones VII (TIN2015-65316-P ) and the Departament d'Innovacio, Universitats i Empresa de la Generalitat de Catalunya , under project MPEXPAR: Models de Pro-gramacio i Entorns d'Execucio Parallels (2014-SGR-1051). Finally, this project also received funding from the Spanish Ministry of Economy and Competitiveness under the Juan de la Cierva Grant Agreement No IJCI-2017-33511 , and from the European Union's Horizon 2020 research and innovation program under the Marie Sklodowska Curie grant agreement No. 749516 This project has received funding from the EPEEC project from the European Union’s Horizon 2020 research and inno-vationprogramme under grant agreement No 801051, from the Spanish Ministry of Economy and Competitvieness under the project Computación de Altas Prestaciones VII ( TIN2015-65316-P ) and the Departament d’Innovació, Universitats i Empresa de la Generalitat de Catalunya , under project MPEX-PAR:Models de Pro-gramació i Entorns d’Execució Paral · lels ( 2014-SGR-1051 ). Finally, this project also received funding from the Spanish Ministry of Economy and Competitvieness under the Juan de la Cierva Grant Agreement No IJCI-2017-33511 , and from the European Union’s Horizon 2020 research and innovation program under the Marie Sklodowska Curie grant agreement No. 749516 .

FundersFunder number
Computacion de Altas Prestaciones VII
Computación de Altas Prestaciones VIITIN2015-65316-P
European Union’s Horizon 2020 Research and Innovation Program
Spanish Ministry of Economy and Competitvieness
Horizon 2020 Framework Programme
Generalitat de CatalunyaIJCI-2017-33511, 2014-SGR-1051
Departament d'Innovació, Universitats i Empresa, Generalitat de Catalunya
Ministerio de Economía y Competitividad
Horizon 2020801051, 749516

    Keywords

    • CUDA
    • Cuda graph
    • Cuda stream
    • Dynamic parallelism
    • GPU

    Fingerprint

    Dive into the research topics of 'Tasking in accelerators: Performance evaluation'. Together they form a unique fingerprint.

    Cite this