Practical efficiency of asynchronous stochastic gradient descent

Onkar Bhardwaj, Guojing Cong

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

5 Scopus citations

Abstract

Stochastic gradient descent (SGD) and its distributed variants are essential to leverage modern computing resources for large-scale machine learning tasks. ASGD [1] is one of the most popular asynchronous distributed variant of SGD. Recent mathematical analyses have shown that with certain assumptions on the learning task (and ignoring communication cost), ASGD exhibits linear speed-up asymptotically. However, as practically observed, ASGD does not lead linear speed-up as we increase the number of learners. Motivated by this, we investigate finite time convergence properties of ASGD. We observe that the learning rate used by mathematical analyses to guarantee linear speed-up can be very small (and practically sub-optimal with respect to convergence speed) as opposed to practically chosen learning rates (for quick convergence) which exhibit sub-linear speed-up. We show that such an observation can in fact be supported by mathematical analysis, i.e., in the finite time regime, better convergence rate guarantees can be proven for ASGD with small number of learners, thus indicating lack of linear speed up as we increase the number of learners. Thus we conclude that even with ignoring communication cost, there is an inherent inefficiency in ASGD with respect to increasing the number of learners.

Original languageEnglish
Title of host publicationProceedings of MLHPC 2016
Subtitle of host publicationMachine Learning in HPC Environments - Held in conjunction with SC 2016: The International Conference for High Performance Computing, Networking, Storage and Analysis
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages56-62
Number of pages7
ISBN (Electronic)9781509038824
DOIs
StatePublished - Jan 27 2017
Externally publishedYes
Event2016 Machine Learning in HPC Environments, MLHPC 2016 - Salt Lake City, United States
Duration: Nov 14 2016 → …

Publication series

NameProceedings of MLHPC 2016: Machine Learning in HPC Environments - Held in conjunction with SC 2016: The International Conference for High Performance Computing, Networking, Storage and Analysis

Conference

Conference2016 Machine Learning in HPC Environments, MLHPC 2016
Country/TerritoryUnited States
CitySalt Lake City
Period11/14/16 → …

Fingerprint

Dive into the research topics of 'Practical efficiency of asynchronous stochastic gradient descent'. Together they form a unique fingerprint.

Cite this