A hierarchical, bulk-synchronous stochastic gradient descent algorithm for deep-learning applications on GPU clusters

Guojing Cong, Onkar Bhardwaj

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

11 Scopus citations

Abstract

The training data and models are becoming increasingly large in many deep-learning applications. Large-scale distributed processing is employed to accelerate training. Increasing the number of learners in synchronous and asynchronous stochastic gradient descent presents challenges to convergence and communication performance. We present our hierarchical, bulk-synchronous stochastic gradient algorithm that effectively balances execution time and accuracy for training in deep-learning applications on GPU clusters. It achieves much better convergence and execution time at scale in comparison to asynchronous stochastic gradient descent implementations. When deployed on a cluster of 128 GPUs, our implementation achieves up to 56 times speedups over the sequential stochastic gradient descent with similar test accuracy for our target application.

Original languageEnglish
Title of host publicationProceedings - 16th IEEE International Conference on Machine Learning and Applications, ICMLA 2017
EditorsXuewen Chen, Bo Luo, Feng Luo, Vasile Palade, M. Arif Wani
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages818-821
Number of pages4
ISBN (Electronic)9781538614174
DOIs
StatePublished - 2017
Externally publishedYes
Event16th IEEE International Conference on Machine Learning and Applications, ICMLA 2017 - Cancun, Mexico
Duration: Dec 18 2017Dec 21 2017

Publication series

NameProceedings - 16th IEEE International Conference on Machine Learning and Applications, ICMLA 2017
Volume2017-December

Conference

Conference16th IEEE International Conference on Machine Learning and Applications, ICMLA 2017
Country/TerritoryMexico
CityCancun
Period12/18/1712/21/17

Keywords

  • Deep learning
  • GPU
  • Stochastic gradient descent
  • distributed algorithm

Fingerprint

Dive into the research topics of 'A hierarchical, bulk-synchronous stochastic gradient descent algorithm for deep-learning applications on GPU clusters'. Together they form a unique fingerprint.

Cite this