Abstract
Large language models (LLMs) have garnered significant attention in both the AI community and beyond. Among these, the Generative Pre-trained Transformer (GPT) has emerged as the dominant architecture, spawning numerous variants. However, these variants have undergone pre-training under diverse conditions, including variations in input data, data preprocessing, and training methodologies, resulting in a lack of controlled comparative studies. Here we meticulously examine two prominent open-sourced GPT architectures, GPT-NeoX and LLaMA, leveraging the computational power of Frontier, the world's first Exascale supercomputer. Employing the same materials science text corpus and a comprehensive end-to-end pipeline, we conduct a comparative analysis of their training and downstream performance. Our efforts culminate in achieving state-of-the-art performance on a challenging materials science benchmark. Furthermore, we investigate the computation and energy efficiency, and propose a computationally efficient method for architecture design. To our knowledge, these pre-trained models represent the largest available for materials science. Our findings provide practical guidance for building LLMs on HPC platforms.
Original language | English |
---|---|
Title of host publication | Proceedings - 2024 IEEE International Parallel and Distributed Processing Symposium, IPDPS 2024 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 556-569 |
Number of pages | 14 |
ISBN (Electronic) | 9798350337662 |
DOIs | |
State | Published - 2024 |
Event | 38th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2024 - San Francisco, United States Duration: May 27 2024 → May 31 2024 |
Publication series
Name | Proceedings - 2024 IEEE International Parallel and Distributed Processing Symposium, IPDPS 2024 |
---|
Conference
Conference | 38th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2024 |
---|---|
Country/Territory | United States |
City | San Francisco |
Period | 05/27/24 → 05/31/24 |
Funding
This research was partially funded by a Lab Directed Research and Development project at Oak Ridge National Laboratory, a U.S. Department of Energy facility managed by UT-Battelle, LLC. This research used resources of the Oak Ridge Leadership Computing Facility (OLCF), which is a DOE Office of Science User Facility at the Oak Ridge National Laboratory supported by the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.
Keywords
- AI foundation model
- GPT architecture
- HPC