Data optimization for large batch distributed training of deep neural networks

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Distributed training in deep learning (DL) is common practice as data and models grow. The current practice for distributed training of deep neural networks faces the challenges of communication bottlenecks when operating at scale, and model accuracy deterioration with an increase in global batch size. Present solutions focus on improving message exchange efficiency as well as implementing techniques to tweak batch sizes and models in the training process. The loss of training accuracy typically happens because the loss function gets trapped in a local minima. We observe that the loss landscape minimization is shaped by both the model and training data and propose a data optimization approach that utilizes machine learning to implicitly smooth out the loss landscape resulting in fewer local minima. Our approach filters out data points which are less important to feature learning, enabling us to speed up the training of models on larger batch sizes to improved accuracy.

Original languageEnglish
Title of host publicationProceedings - 2020 International Conference on Computational Science and Computational Intelligence, CSCI 2020
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1197-1203
Number of pages7
ISBN (Electronic)9781728176246
DOIs
StatePublished - Dec 2020
Event2020 International Conference on Computational Science and Computational Intelligence, CSCI 2020 - Las Vegas, United States
Duration: Dec 16 2020Dec 18 2020

Publication series

NameProceedings - 2020 International Conference on Computational Science and Computational Intelligence, CSCI 2020

Conference

Conference2020 International Conference on Computational Science and Computational Intelligence, CSCI 2020
Country/TerritoryUnited States
CityLas Vegas
Period12/16/2012/18/20

Funding

output to remove noisy data. Our work lends itself to conversion into an interactive tool to visualize and investigate the working of a model using such a methodology which can facilitate reproducibility and model transparency. We find that our technique works with a pre-trained ResNet-101 used as a feature extractor as well, instead of training a model from scratch. We intend to expand this work using other pretrained networks in the future. These techniques are likely to be particularly effective in a variety of domain areas such as anomaly detection, data compression, and data filtering. ACKNOWLEDGMENT This research was sponsored by and used resources of the Oak Ridge Leadership Computing Facility (OLCF), which is a DOE Office of Science User Facility and the Compute and Data Environment for Science (CADES) at the Oak Ridge National Laboratory supported by the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.

Fingerprint

Dive into the research topics of 'Data optimization for large batch distributed training of deep neural networks'. Together they form a unique fingerprint.

Cite this