Abstract
We present a portable and highly-optimized Deep Neural Network (DNN) algorithm and its implementation techniques. Our approach is a novel combination of existing HPC techniques that methodically applies autotuning as well as data layout and low-level optimizations that achieve performance matching and/or exceeding what is possible with either reverse engineering and manual assembly coding or proprietary vendor libraries. The former was done inside the maxDNN implementation and the latter is represented by cuDNN. Our work may be directly applied to the most time consuming part of DNN workflow, namely the training process which often needs a restart when it stagnates due to, for example, diminishing gradients and getting stuck in local minima. With the result of performance tests on a consumer-grade GPU with the latest High Bandwidth Memory (HBM) stack, our methodology can match a server grade hardware at a fraction of the price. Another tuning sweep on a new GPU architecture from a different vendor also attests to the portability of our approach and the quality of our implementation.
Original language | English |
---|---|
Title of host publication | Proceedings of MLHPC 2016 |
Subtitle of host publication | Machine Learning in HPC Environments - Held in conjunction with SC 2016: The International Conference for High Performance Computing, Networking, Storage and Analysis |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 9-18 |
Number of pages | 10 |
ISBN (Electronic) | 9781509038824 |
DOIs | |
State | Published - Jan 27 2017 |
Externally published | Yes |
Event | 2016 Machine Learning in HPC Environments, MLHPC 2016 - Salt Lake City, United States Duration: Nov 14 2016 → … |
Publication series
Name | Proceedings of MLHPC 2016: Machine Learning in HPC Environments - Held in conjunction with SC 2016: The International Conference for High Performance Computing, Networking, Storage and Analysis |
---|
Conference
Conference | 2016 Machine Learning in HPC Environments, MLHPC 2016 |
---|---|
Country/Territory | United States |
City | Salt Lake City |
Period | 11/14/16 → … |
Funding
This work was funded by NSF through Award Number 1439052. We would like to thank the BEAST team for help with autotuning sweeps.