Performance-portable autotuning of OpenCL kernels for convolutional layers of deep neural networks

Yaohung M. Tsai, Piotr Luszczek, Jakub Kurzak, Jack Dongarra

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

10 Scopus citations

Abstract

We present a portable and highly-optimized Deep Neural Network (DNN) algorithm and its implementation techniques. Our approach is a novel combination of existing HPC techniques that methodically applies autotuning as well as data layout and low-level optimizations that achieve performance matching and/or exceeding what is possible with either reverse engineering and manual assembly coding or proprietary vendor libraries. The former was done inside the maxDNN implementation and the latter is represented by cuDNN. Our work may be directly applied to the most time consuming part of DNN workflow, namely the training process which often needs a restart when it stagnates due to, for example, diminishing gradients and getting stuck in local minima. With the result of performance tests on a consumer-grade GPU with the latest High Bandwidth Memory (HBM) stack, our methodology can match a server grade hardware at a fraction of the price. Another tuning sweep on a new GPU architecture from a different vendor also attests to the portability of our approach and the quality of our implementation.

Original languageEnglish
Title of host publicationProceedings of MLHPC 2016
Subtitle of host publicationMachine Learning in HPC Environments - Held in conjunction with SC 2016: The International Conference for High Performance Computing, Networking, Storage and Analysis
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages9-18
Number of pages10
ISBN (Electronic)9781509038824
DOIs
StatePublished - Jan 27 2017
Externally publishedYes
Event2016 Machine Learning in HPC Environments, MLHPC 2016 - Salt Lake City, United States
Duration: Nov 14 2016 → …

Publication series

NameProceedings of MLHPC 2016: Machine Learning in HPC Environments - Held in conjunction with SC 2016: The International Conference for High Performance Computing, Networking, Storage and Analysis

Conference

Conference2016 Machine Learning in HPC Environments, MLHPC 2016
Country/TerritoryUnited States
CitySalt Lake City
Period11/14/16 → …

Funding

This work was funded by NSF through Award Number 1439052. We would like to thank the BEAST team for help with autotuning sweeps.

FundersFunder number
National Science Foundation1439052

    Fingerprint

    Dive into the research topics of 'Performance-portable autotuning of OpenCL kernels for convolutional layers of deep neural networks'. Together they form a unique fingerprint.

    Cite this