An Efficient, Distributed Stochastic Gradient Descent Algorithm for Deep-Learning Applications

Guojing Cong, Onkar Bhardwaj, Minwei Feng

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

11 Scopus citations

Abstract

Parallel and distributed processing is employed to accelerate training for many deep-learning applications with large models and inputs. As it reduces synchronization and communication overhead by tolerating stale gradient updates, asynchronous stochastic gradient descent (ASGD), derived from stochastic gradient descent (SGD), is widely used. Recent theoretical analyses show ASGD converges with linear asymptotic speedup over SGD.Oftentimes glossed over in theoretical analysis are communication overhead and practical learning rates that are critical to the performance of ASGD. After analyzing the communication performance and convergence behavior of ASGD using the Downpour algorithm as an example, we demonstrate the challenges for ASGD to achieve good practical speedup over SGD. We propose a distributed, bulk-synchronous stochastic gradient descent algorithm that allows for sparse gradient aggregation from individual learners. The communication cost is amortized explicitly by a gradient aggregation interval, and global reductions are used instead of a parameter server for gradient aggregation. We prove its convergence and show that it has superior communication performance and convergence behavior over popular ASGD implementations such as Downpour and EAMSGD for deep-learning applications.

Original languageEnglish
Title of host publicationProceedings - 46th International Conference on Parallel Processing, ICPP 2017
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages11-20
Number of pages10
ISBN (Electronic)9781538610428
DOIs
StatePublished - Sep 1 2017
Externally publishedYes
Event46th International Conference on Parallel Processing, ICPP 2017 - Bristol, United Kingdom
Duration: Aug 14 2017Aug 17 2017

Publication series

NameProceedings of the International Conference on Parallel Processing
ISSN (Print)0190-3918

Conference

Conference46th International Conference on Parallel Processing, ICPP 2017
Country/TerritoryUnited Kingdom
CityBristol
Period08/14/1708/17/17

Keywords

  • Deep learning
  • Distributed processing
  • Stochastic gradient descent

Fingerprint

Dive into the research topics of 'An Efficient, Distributed Stochastic Gradient Descent Algorithm for Deep-Learning Applications'. Together they form a unique fingerprint.

Cite this