Communication-Efficient Parallelization Strategy for Deep Convolutional Neural Network Training

Sunwoo Lee, Ankit Agrawal, Prasanna Balaprakash, Alok Choudhary, Wei Keng Liao

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

7 Scopus citations

Abstract

Training Convolutional Neural Network (CNN) models is extremely time-consuming and the efficiency of its parallelization plays a key role in finishing the training in a reasonable amount of time. The well-known synchronous Stochastic Gradient Descent (SGD) algorithm suffers from high costs of inter-process communication and synchronization. To address such problems, asynchronous SGD algorithm employs a master-slave model for parameter update. However, it can result in a poor convergence rate due to the staleness of the gradient. In addition, the master-slave model is not scalable when running on a large number of compute nodes. In this paper, we present a communication-efficient gradient averaging algorithm for synchronous SGD, which adopts a few design strategies to maximize the degree of overlap between computation and communication. The time complexity analysis shows our algorithm outperforms the traditional allreduce-based algorithm. By training the two popular deep CNN models, VGG-16 and ResNet-50, on ImageNet dataset, our experiments performed on Cori Phase-I, a Cray XC40 supercomputer at NERSC show that our algorithm can achieve \mathbf{2516.36}\times speedup for VGG-16 and \mathbf{2734.25}\times speedup for ResNet-50 using up to 8192 cores.

Original languageEnglish
Title of host publicationProceedings of MLHPC 2018
Subtitle of host publicationMachine Learning in HPC Environments, Held in conjunction with SC 2018: The International Conference for High Performance Computing, Networking, Storage and Analysis
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages47-56
Number of pages10
ISBN (Electronic)9781728101804
DOIs
StatePublished - Jul 2 2018
Externally publishedYes
Event2018 IEEE/ACM Machine Learning in HPC Environments, MLHPC 2018 - Dallas, United States
Duration: Nov 12 2018 → …

Publication series

NameProceedings of MLHPC 2018: Machine Learning in HPC Environments, Held in conjunction with SC 2018: The International Conference for High Performance Computing, Networking, Storage and Analysis

Conference

Conference2018 IEEE/ACM Machine Learning in HPC Environments, MLHPC 2018
Country/TerritoryUnited States
CityDallas
Period11/12/18 → …

Funding

This material is based upon work supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, Scientific Discovery through Advanced Computing (SciDAC) program. This work is also supported in part by NSF awards CCF-1409601, DOE awards DE-SC0007456, DE-SC0014330, and NIST award 70NANB14H012. This research used resources of the National Energy Research Scientific Computing Center, a DOE Office of Science User Facility supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231. This material is based upon work supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, Scientific Discovery through Advanced Computing (SciDAC) program. This work is also supported in part by NSF awards CCF-1409601, DOE awards DE-SC0007456, DE-SC0014330, and NIST award 70NANB14H012. This research used resources of the National Energy Research Scientific Computing Center, a DOE Office of Science User Facility supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231 The submitted manuscript has been created by UChicago Argonne, LLC, Operator of Argonne National Laboratory (Argonne). Argonne, a U.S. Department of Energy Office of Science laboratory, is operated under Contract No. DE-AC02-06CH11357. The U.S. Government retains for itself, and others acting on its behalf, a paid-up nonexclusive, irrevocable worldwide license in said article to reproduce, prepare derivative works, distribute copies to the public, and perform publicly and display publicly, by or on behalf of the Government. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan. http://energy.gov/downloads/doe-public-access-plan

FundersFunder number
National Science FoundationCCF-1409601
U.S. Department of EnergyDE-AC02-06CH11357, DE-SC0014330, DE-SC0007456
National Institute of Standards and Technology70NANB14H012
Office of ScienceDE-AC02-05CH11231
Advanced Scientific Computing Research
Norsk Sykepleierforbund

    Keywords

    • Convolutional Neural Network
    • Deep Learning
    • Distributed-Memory Parallelization
    • Parallelization

    Fingerprint

    Dive into the research topics of 'Communication-Efficient Parallelization Strategy for Deep Convolutional Neural Network Training'. Together they form a unique fingerprint.

    Cite this