Flexible Communication for Optimal Distributed Learning over Unpredictable Networks

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Gradient compression alleviates expensive communication in distributed deep learning by sending fewer values and its corresponding indices, typically via Allgather (AG). Training with high compression ratio (CR) achieves high accuracy like DenseSGD, but has lower parallel scaling due to high communication cost (i.e., parallel efficiency). Using lower CRs improves parallel efficiency by lowering synchronization cost, but degrades model accuracy as well (statistical efficiency). Further, speedup attained with different models and CRs also varies with network latency, effective bandwidth and collective op used for aggregation. In many cases, collectives like Allreduce (AR) have lower cost than AG to exchange the same amount of data. In this paper, we propose an AR-compatible Top k compressor that is bandwidth-optimal and thus performs better than AG in certain network configurations. We develop a flexible communication strategy that switches between AG and AR based on which collective is optimal in the current settings, and model the pareto-relationship between parallel and statistical efficiency as a multiobjective optimization (MOO) problem to dynamically adjust CR and accelerate training while still converging to high accuracy.

Original languageEnglish
Title of host publicationProceedings - 2023 IEEE International Conference on Big Data, BigData 2023
EditorsJingrui He, Themis Palpanas, Xiaohua Hu, Alfredo Cuzzocrea, Dejing Dou, Dominik Slezak, Wei Wang, Aleksandra Gruca, Jerry Chun-Wei Lin, Rakesh Agrawal
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages925-935
Number of pages11
ISBN (Electronic)9798350324457
DOIs
StatePublished - 2023
Externally publishedYes
Event2023 IEEE International Conference on Big Data, BigData 2023 - Sorrento, Italy
Duration: Dec 15 2023Dec 18 2023

Publication series

NameProceedings - 2023 IEEE International Conference on Big Data, BigData 2023

Conference

Conference2023 IEEE International Conference on Big Data, BigData 2023
Country/TerritoryItaly
CitySorrento
Period12/15/2312/18/23

Fingerprint

Dive into the research topics of 'Flexible Communication for Optimal Distributed Learning over Unpredictable Networks'. Together they form a unique fingerprint.

Cite this