ORBIT: Oak Ridge Base Foundation Model for Earth System Predictability

Xiao Wang, Siyan Liu, Aristeidis Tsaris, Jong Youl Choi, Ashwin M. Aji, Ming Fan, Wei Zhang, Junqi Yin, Moetasim Ashfaq, Dan Lu, Prasanna Balaprakash

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Earth system predictability is challenged by the complexity of environmental dynamics and the multitude of variables involved. Current AI foundation models, although advanced by leveraging large and heterogeneous data, are often constrained by their size and data integration, limiting their effectiveness in addressing the full range of Earth system prediction challenges. To overcome these limitations, we introduce the Oak Ridge Base Foundation Model for Earth System Predictability (ORBIT), an advanced vision transformer model that scales up to 113 billion parameters using a novel hybrid tensor-data orthogonal parallelism technique. As the largest model of its kind, ORBIT surpasses the current climate AI foundation model size by a thousandfold. Performance scaling tests conducted on the Frontier supercomputer have demonstrated that ORBIT achieves 684 petaFLOPS to 1.6 exaFLOPS sustained throughput, with scaling efficiency maintained at 41% to 85% across 49,152 AMD GPUs. These breakthroughs establish new advances in AIdriven climate modeling and demonstrate promise to significantly improve the Earth system predictability.

Original languageEnglish
Title of host publicationProceedings of SC 2024
Subtitle of host publicationInternational Conference for High Performance Computing, Networking, Storage and Analysis
PublisherIEEE Computer Society
ISBN (Electronic)9798350352917
DOIs
StatePublished - 2024
Event2024 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2024 - Atlanta, United States
Duration: Nov 17 2024Nov 22 2024

Publication series

NameInternational Conference for High Performance Computing, Networking, Storage and Analysis, SC
ISSN (Print)2167-4329
ISSN (Electronic)2167-4337

Conference

Conference2024 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2024
Country/TerritoryUnited States
CityAtlanta
Period11/17/2411/22/24

Funding

The authors thank Ver onica G. Melesse Vergara, Mallikarjun (Arjun) Shankar and Bronson Messer for their support of high performance computing resources. Additionally, we thank Vishwas Rao for his valuable feedback to the development of this paper. This manuscript has been authored by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the US Department of Energy (DOE). The U.S. government retains and the publisher acknowledges that the US government retains a nonexclusive worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so for US government purposes. This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory (ORNL), which is supported by the Office of Science of the U.S. Department of Energy (DOE). This research was primary supported by the ORNL's AI Initiative sponsored by the Director's Research and Development Program at ORNL, additionally supported by the BER-ASCR SciDAC Program in the DOE, and by DOE Early Career Project sponsored by the BER program.

Fingerprint

Dive into the research topics of 'ORBIT: Oak Ridge Base Foundation Model for Earth System Predictability'. Together they form a unique fingerprint.

Cite this