Towards native execution of deep learning on a leadership-class HPC system

Srikanth Yoginath, Maksudul Alam, Arvind Ramanathan, Debsindhu Bhowmik, Nouamane Laanait, Kalyan S. Perumalla

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

6 Scopus citations

Abstract

Large parallel machines generally offer the best parallel performance with 'native execution' that is achieved using codes developed with the optimized compilers, communication libraries, and runtimes offered on the machines. In this paper, we report and analyze performance results from native execution of deep learning on a leadership-class high-performance computing (HPC) system. Using our new code called DeepEx, we present a study of the parallel speed up and convergence rates of learning achieved with native parallel execution. In the trade-off between computational parallelism and synchronized convergence, we first focus on maximizing parallelism while still obtaining convergence. Scaling results are reported from execution on up to 15,000 GPUs using two scientific data sets from atom microscopy and protein folding applications, and also using the popular ImageNet data set. In terms of the traditional measure of parallel speed up, excellent scaling is observed up to 12,000 GPUs. Additionally, accounting for convergence rates of deep learning accuracy or error, a deep learning-specific metric called 'learning speed up' is also tracked. The performance results indicate the need to evaluate parallel deep learning execution in terms of learning speed up, and point to additional directions for improved exploitation of high-end HPC systems.

Original languageEnglish
Title of host publicationProceedings - 2019 IEEE 33rd International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages941-950
Number of pages10
ISBN (Electronic)9781728135106
DOIs
StatePublished - May 2019
Event33rd IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2019 - Rio de Janeiro, Brazil
Duration: May 20 2019May 24 2019

Publication series

NameProceedings - 2019 IEEE 33rd International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2019

Conference

Conference33rd IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2019
Country/TerritoryBrazil
CityRio de Janeiro
Period05/20/1905/24/19

Funding

This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan). This research used resources of the Oak Ridge Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC05-00OR22725.

FundersFunder number
U.S. Department of Energy
Office of Science

    Keywords

    • Deep Learning
    • Learning Speedup
    • Massively Parallel Systems
    • Parallel Speedup

    Fingerprint

    Dive into the research topics of 'Towards native execution of deep learning on a leadership-class HPC system'. Together they form a unique fingerprint.

    Cite this