FORGE: Pre-Training Open Foundation Models for Science

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Large language models (LLMs) are poised to revolutionize the way we conduct scientific research. However, both model complexity and pre-training cost are impeding effective adoption for the wider science community. Identifying suitable scientific use cases, finding the optimal balance between model and data sizes, and scaling up model training are among the most pressing issues that need to be addressed. In this study, we provide practical solutions for building and using LLM-based foundation models targeting scientific research use cases. We present an end-to-end examination of the effectiveness of LLMs in scientific research, including their scaling behavior and computational requirements on Frontier, the first Exascale supercomputer. We have also developed for release to the scientific community a suite of open foundation models called FORGE with up to 26B parameters using 257B tokens from over 200M scientific articles, with performance either on par or superior to other state-of-the-art comparable models. We have demonstrated the use and effectiveness of FORGE on scientific downstream tasks. Our research establishes best practices that can be applied across various fields to take advantage of LLMs for scientific discovery.

Original languageEnglish
Title of host publicationProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2023
PublisherAssociation for Computing Machinery, Inc
ISBN (Electronic)9798400701092
DOIs
StatePublished - Nov 12 2023
Event2023 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2023 - Denver, United States
Duration: Nov 12 2023Nov 17 2023

Publication series

NameProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2023

Conference

Conference2023 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2023
Country/TerritoryUnited States
CityDenver
Period11/12/2311/17/23

Funding

J.Y. would like to thank Quentin Anthony and Stella Biderman from EleutherAI for valuable discussions. This research was sponsored by and used resources of the Oak Ridge Leadership Computing Facility (OLCF), which is a DOE Office of Science User Facility at the Oak Ridge National Laboratory supported by the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.

FundersFunder number
U.S. Department of EnergyDE-AC05-00OR22725
Office of Science

    Fingerprint

    Dive into the research topics of 'FORGE: Pre-Training Open Foundation Models for Science'. Together they form a unique fingerprint.

    Cite this