MDLoader: A Hybrid Model-driven Data Loader for Distributed Deep Neural Networks Training

Jonghyun Bae, Jong Youl Choi, Massimiliano Lupo Pasini, Kshitij Mehta, Khaled Z. Ibrahim

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In this work, we propose MD Loader, a hybrid in-memory data loader for distributed deep neural networks. MDLoader introduces a model-driven performance estimator to automatically switch between one-sided and collective communication at runtime.

Original languageEnglish
Title of host publication2024 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2024
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1193-1195
Number of pages3
ISBN (Electronic)9798350364606
DOIs
StatePublished - 2024
Event2024 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2024 - San Francisco, United States
Duration: May 27 2024May 31 2024

Publication series

Name2024 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2024

Conference

Conference2024 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2024
Country/TerritoryUnited States
CitySan Francisco
Period05/27/2405/31/24

Funding

This work has been supported by the SciDAC Institute for Computer Science, Data, and Artificial Intelligence (RAPIDS), Lawrence Berkeley National Laboratory, which is operated by the University of California for the U.S. Department of Energy under contract DE-AC02-05CH11231. This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory and the National Energy Research Scientific Computing Center (NERSC), which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725 and No. DE-AC02-05CH11231 using NERSC award ASCRERCAP0025216, respectively. This research is sponsored by the Artificial Intelligence Initiative as part of the Laboratory Directed Research and Development (LDRD) Program of Oak Ridge National Laboratory, managed by UT-Battelle, LLC, for the US Department of Energy under contract DE-AC05-00OR22725.

Keywords

  • Collective communication
  • Graph Neural Network
  • One-sided communication
  • Performance estimator

Fingerprint

Dive into the research topics of 'MDLoader: A Hybrid Model-driven Data Loader for Distributed Deep Neural Networks Training'. Together they form a unique fingerprint.

Cite this