Accelerating Distributed ML Training via Selective Synchronization (Poster Abstract)

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

In Bulk-synchronous parallel (BSP) or simply synchronous training, deep neural networks (DNNs) are launched across multiple workers concurrently and aggregate their local updates either by Parameter server (PS) [1] or via decentralized AllReduce [2]. Thus, aggregation step on every iteration is blocking, i.e., all workers must wait for reduction phase to complete before proceeding to the next step. ML accelerators like GPUs and TPUs have reduced computation times, but communication cost continues to increase with the growing size of DNNs. Even with weak scaling and Gustafson's law [3] , distributed training does not scale linearly with the number of workers due to high synchronization overhead.

Original languageEnglish
Title of host publicationProceedings - 2023 IEEE International Conference on Cluster Computing Workshops and Posters, CLUSTER Workshops 2023
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages56-57
Number of pages2
ISBN (Electronic)9798350370621
DOIs
StatePublished - 2023
Externally publishedYes
Event25th IEEE International Conference on Cluster Computing Workshops, CLUSTER Workshops 2023 - Santa Fe, United States
Duration: Oct 31 2023Nov 3 2023

Publication series

NameProceedings - IEEE International Conference on Cluster Computing, ICCC
ISSN (Print)1552-5244

Conference

Conference25th IEEE International Conference on Cluster Computing Workshops, CLUSTER Workshops 2023
Country/TerritoryUnited States
CitySanta Fe
Period10/31/2311/3/23

Keywords

  • deep learning
  • distributed training
  • machine learning

Fingerprint

Dive into the research topics of 'Accelerating Distributed ML Training via Selective Synchronization (Poster Abstract)'. Together they form a unique fingerprint.

Cite this