Improving Scalability of Parallel CNN Training by Adjusting Mini-Batch Size at Run-Time

Sunwoo Lee, Qiao Kang, Sandeep Madireddy, Prasanna Balaprakash, Ankit Agrawal, Alok Choudhary, Richard Archibald, Wei Keng Liao

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

20 Scopus citations

Abstract

Training Convolutional Neural Network (CNN) is a computationally intensive task, requiring efficient parallelization to shorten the execution time. Considering the ever-increasing size of available training data, the parallelization of CNN training becomes more important. Data-parallelism, a popular parallelization strategy that distributes the input data among compute processes, requires the mini-batch size to be sufficiently large to achieve a high degree of parallelism. However, training with large batch size is known to produce a low convergence accuracy. In image restoration problems, for example, the batch size is typically tuned to a small value between 16 ∼ 64, making it challenging to scale up the training. In this paper, we propose a parallel CNN training strategy that gradually increases the mini-batch size and learning rate at run-time. While improving the scalability, this strategy also maintains the accuracy close to that of the training with a fixed small batch size. We evaluate the performance of the proposed parallel CNN training algorithm with image regression and classification applications using various models and datasets.

Original languageEnglish
Title of host publicationProceedings - 2019 IEEE International Conference on Big Data, Big Data 2019
EditorsChaitanya Baru, Jun Huan, Latifur Khan, Xiaohua Tony Hu, Ronay Ak, Yuanyuan Tian, Roger Barga, Carlo Zaniolo, Kisung Lee, Yanfang Fanny Ye
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages830-839
Number of pages10
ISBN (Electronic)9781728108582
DOIs
StatePublished - Dec 2019
Event2019 IEEE International Conference on Big Data, Big Data 2019 - Los Angeles, United States
Duration: Dec 9 2019Dec 12 2019

Publication series

NameProceedings - 2019 IEEE International Conference on Big Data, Big Data 2019

Conference

Conference2019 IEEE International Conference on Big Data, Big Data 2019
Country/TerritoryUnited States
CityLos Angeles
Period12/9/1912/12/19

Funding

ACKNOWLEDGMENT This material is based upon work supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, Scientific Discovery through Advanced Computing (SciDAC) program. This work is also supported in part by DOE awards DE-SC0014330 and DESC0019358. This research used resources of the National Energy Research Scientific Computing Center (NERSC), a U.S. Department of Energy Office of Science User Facility operated under Contract No. DE-AC02-05CH11231.

Keywords

  • Adaptive Batch Size
  • Convolutional Neural Network
  • Deep Learning
  • Parallelization

Fingerprint

Dive into the research topics of 'Improving Scalability of Parallel CNN Training by Adjusting Mini-Batch Size at Run-Time'. Together they form a unique fingerprint.

Cite this