Abstract
Large language models (LLMs) are poised to revolutionize the way we conduct scientific research. However, both model complexity and pre-training cost are impeding effective adoption for the wider science community. Identifying suitable scientific use cases, finding the optimal balance between model and data sizes, and scaling up model training are among the most pressing issues that need to be addressed. In this study, we provide practical solutions for building and using LLM-based foundation models targeting scientific research use cases. We present an end-to-end examination of the effectiveness of LLMs in scientific research, including their scaling behavior and computational requirements on Frontier, the first Exascale supercomputer. We have also developed for release to the scientific community a suite of open foundation models called FORGE with up to 26B parameters using 257B tokens from over 200M scientific articles, with performance either on par or superior to other state-of-the-art comparable models. We have demonstrated the use and effectiveness of FORGE on scientific downstream tasks. Our research establishes best practices that can be applied across various fields to take advantage of LLMs for scientific discovery.
| Original language | English |
|---|---|
| Title of host publication | SC 2023 - International Conference for High Performance Computing, Networking, Storage and Analysis |
| Publisher | IEEE Computer Society |
| ISBN (Electronic) | 9798400701092 |
| DOIs | |
| State | Published - 2023 |
| Event | 2023 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2023 - Denver, United States Duration: Nov 12 2023 → Nov 17 2023 |
Publication series
| Name | International Conference for High Performance Computing, Networking, Storage and Analysis, SC |
|---|---|
| ISSN (Print) | 2167-4329 |
| ISSN (Electronic) | 2167-4337 |
Conference
| Conference | 2023 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2023 |
|---|---|
| Country/Territory | United States |
| City | Denver |
| Period | 11/12/23 → 11/17/23 |
Funding
J.Y. would like to thank Quentin Anthony and Stella Biderman from EleutherAI for valuable discussions. This research was sponsored by and used resources of the Oak Ridge Leadership Computing Facility (OLCF), which is a DOE Office of Science User Facility at the Oak Ridge National Laboratory supported by the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.