Abstract
In this work, we propose MD Loader, a hybrid in-memory data loader for distributed deep neural networks. MDLoader introduces a model-driven performance estimator to automatically switch between one-sided and collective communication at runtime.
Original language | English |
---|---|
Title of host publication | 2024 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2024 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 1193-1195 |
Number of pages | 3 |
ISBN (Electronic) | 9798350364606 |
DOIs | |
State | Published - 2024 |
Event | 2024 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2024 - San Francisco, United States Duration: May 27 2024 → May 31 2024 |
Publication series
Name | 2024 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2024 |
---|
Conference
Conference | 2024 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2024 |
---|---|
Country/Territory | United States |
City | San Francisco |
Period | 05/27/24 → 05/31/24 |
Funding
This work has been supported by the SciDAC Institute for Computer Science, Data, and Artificial Intelligence (RAPIDS), Lawrence Berkeley National Laboratory, which is operated by the University of California for the U.S. Department of Energy under contract DE-AC02-05CH11231. This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory and the National Energy Research Scientific Computing Center (NERSC), which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725 and No. DE-AC02-05CH11231 using NERSC award ASCRERCAP0025216, respectively. This research is sponsored by the Artificial Intelligence Initiative as part of the Laboratory Directed Research and Development (LDRD) Program of Oak Ridge National Laboratory, managed by UT-Battelle, LLC, for the US Department of Energy under contract DE-AC05-00OR22725.
Keywords
- Collective communication
- Graph Neural Network
- One-sided communication
- Performance estimator