Abstract
Vision-based scientific foundation models hold significant promise for advancing scientific discovery and innovation. This potential stems from their ability to aggregate images from diverse sources-such as varying physical groundings or data acquisition systems-and to learn spatio-temporal correlations using transformer architectures. However, tokenizing and aggregating images can be compute-intensive, a challenge not fully addressed by current distributed methods. In this work, we introduce the Distributed Cross-Channel Hierarchical Aggregation (D-CHAG) approach designed for datasets with a large number of channels across image modalities. Our method is compatible with any model-parallel strategy and any type of vision transformer architecture, significantly improving computational efficiency. We evaluated D-CHAG on hyperspectral imaging and weather forecasting tasks. When integrated with tensor parallelism and model sharding, our approach achieved up to a 75% reduction in memory usage and more than doubled sustained throughput on up to 1,024 AMD GPUs on the Frontier Supercomputer.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2025 |
| Publisher | Association for Computing Machinery, Inc |
| Pages | 935-948 |
| Number of pages | 14 |
| ISBN (Electronic) | 9798400714665 |
| DOIs | |
| State | Published - Nov 15 2025 |
| Event | 2025 International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2025 - St. Louis, United States Duration: Nov 16 2025 → Nov 21 2025 |
Publication series
| Name | Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2025 |
|---|
Conference
| Conference | 2025 International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2025 |
|---|---|
| Country/Territory | United States |
| City | St. Louis |
| Period | 11/16/25 → 11/21/25 |
Funding
This manuscript has been authored by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the US Department of Energy (DOE). The US government retains and the publisher, by accepting the article for publication, acknowledges that the US government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for US government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan). This research was sponsored by and used resources of the Oak Ridge Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC05-00OR22725. Also it was additionally supported by the ORNL's AI Initiative sponsored by the Director's Research and Development Program at ORNL, and by the Center for Bioenergy Innovation (CBI), which is a U.S. Department of Energy Bioenergy Research Center supported by the Office of Biological and Environmental Research in the DOE Office of Science. Use of the Advanced Plant Phenotyping Laboratory is acknowledged at Oak Ridge National Laboratory. Oak Ridge National Laboratory is managed by UT-Battelle, LLC for the US DOE under Contract Number DE-AC05-00OR22725.
Keywords
- Computing methodologies
- Distributed deep learning
- Machine learning algorithms
- Parallel algorithms
Fingerprint
Dive into the research topics of 'Distributed Cross-Channel Hierarchical Aggregation for Foundation Models'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver