Anderson Acceleration for Distributed Training of Deep Learning Models

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Scopus citations

Abstract

Anderson acceleration (AA) is an extrapolation technique that has recently gained interest in the deep learning (DL) community to speed-up the sequential training of DL models. However, when performed at large scale, the DL training is exposed to a higher risk of getting trapped into steep local minima of the training loss function, and standard AA does not provide sufficient acceleration to escape from these steep local minima. This results in poor generalizability and makes AA ineffective. To restore AA's advantage to speed-up the training of DL models on large scale computing platforms, we combine AA with an adaptive moving average procedure that boosts the training to escape from steep local minima. By monitoring the relative standard deviation between consecutive iterations, we also introduce a criterion to automatically assess whether the moving average is needed. We applied the method to the following DL instantiations for image classification: (i) ResNet50 trained on the open-source CIFAR100 dataset and (ii) ResNet50 trained on the open-source ImageNet1k dataset. Numerical results obtained using up to 1,536 NVIDIA V100 GPUs on the OLCF supercomputer Summit showed the stabilizing effect of the moving average on AA for all the problems above.

Original languageEnglish
Title of host publicationSoutheastCon 2022
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages289-295
Number of pages7
ISBN (Electronic)9781665406529
DOIs
StatePublished - 2022
EventSoutheastCon 2022 - Mobile, United States
Duration: Mar 26 2022Apr 3 2022

Publication series

NameConference Proceedings - IEEE SOUTHEASTCON
Volume2022-March
ISSN (Print)1091-0050
ISSN (Electronic)1558-058X

Conference

ConferenceSoutheastCon 2022
Country/TerritoryUnited States
CityMobile
Period03/26/2204/3/22

Funding

This work used resources of the Oak Ridge Leadership Computing Facility, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. This research is sponsored by the Artificial Intelligence Initiative as part of the Laboratory Directed Research and Development Program of Oak Ridge National Laboratory, managed by UT-Battelle, LLC, for the US Department of Energy under contract DE-AC05-00OR22725. This manuscript has been authored in part by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the US

Keywords

  • Artificial Intelligence
  • High Performance Computing
  • Multicore processing

Fingerprint

Dive into the research topics of 'Anderson Acceleration for Distributed Training of Deep Learning Models'. Together they form a unique fingerprint.

Cite this