Abstract
Large parallel machines generally offer the best parallel performance with 'native execution' that is achieved using codes developed with the optimized compilers, communication libraries, and runtimes offered on the machines. In this paper, we report and analyze performance results from native execution of deep learning on a leadership-class high-performance computing (HPC) system. Using our new code called DeepEx, we present a study of the parallel speed up and convergence rates of learning achieved with native parallel execution. In the trade-off between computational parallelism and synchronized convergence, we first focus on maximizing parallelism while still obtaining convergence. Scaling results are reported from execution on up to 15,000 GPUs using two scientific data sets from atom microscopy and protein folding applications, and also using the popular ImageNet data set. In terms of the traditional measure of parallel speed up, excellent scaling is observed up to 12,000 GPUs. Additionally, accounting for convergence rates of deep learning accuracy or error, a deep learning-specific metric called 'learning speed up' is also tracked. The performance results indicate the need to evaluate parallel deep learning execution in terms of learning speed up, and point to additional directions for improved exploitation of high-end HPC systems.
Original language | English |
---|---|
Title of host publication | Proceedings - 2019 IEEE 33rd International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2019 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 941-950 |
Number of pages | 10 |
ISBN (Electronic) | 9781728135106 |
DOIs | |
State | Published - May 2019 |
Event | 33rd IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2019 - Rio de Janeiro, Brazil Duration: May 20 2019 → May 24 2019 |
Publication series
Name | Proceedings - 2019 IEEE 33rd International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2019 |
---|
Conference
Conference | 33rd IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2019 |
---|---|
Country/Territory | Brazil |
City | Rio de Janeiro |
Period | 05/20/19 → 05/24/19 |
Funding
This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan). This research used resources of the Oak Ridge Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC05-00OR22725.
Keywords
- Deep Learning
- Learning Speedup
- Massively Parallel Systems
- Parallel Speedup