TY - GEN
T1 - FORGE
T2 - 2023 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2023
AU - Yin, Junqi
AU - Dash, Sajal
AU - Wang, Feiyi
AU - Shankar, Mallikarjun
N1 - Publisher Copyright:
© 2023 Owner/Author(s).
PY - 2023/11/12
Y1 - 2023/11/12
N2 - Large language models (LLMs) are poised to revolutionize the way we conduct scientific research. However, both model complexity and pre-training cost are impeding effective adoption for the wider science community. Identifying suitable scientific use cases, finding the optimal balance between model and data sizes, and scaling up model training are among the most pressing issues that need to be addressed. In this study, we provide practical solutions for building and using LLM-based foundation models targeting scientific research use cases. We present an end-to-end examination of the effectiveness of LLMs in scientific research, including their scaling behavior and computational requirements on Frontier, the first Exascale supercomputer. We have also developed for release to the scientific community a suite of open foundation models called FORGE with up to 26B parameters using 257B tokens from over 200M scientific articles, with performance either on par or superior to other state-of-the-art comparable models. We have demonstrated the use and effectiveness of FORGE on scientific downstream tasks. Our research establishes best practices that can be applied across various fields to take advantage of LLMs for scientific discovery.
AB - Large language models (LLMs) are poised to revolutionize the way we conduct scientific research. However, both model complexity and pre-training cost are impeding effective adoption for the wider science community. Identifying suitable scientific use cases, finding the optimal balance between model and data sizes, and scaling up model training are among the most pressing issues that need to be addressed. In this study, we provide practical solutions for building and using LLM-based foundation models targeting scientific research use cases. We present an end-to-end examination of the effectiveness of LLMs in scientific research, including their scaling behavior and computational requirements on Frontier, the first Exascale supercomputer. We have also developed for release to the scientific community a suite of open foundation models called FORGE with up to 26B parameters using 257B tokens from over 200M scientific articles, with performance either on par or superior to other state-of-the-art comparable models. We have demonstrated the use and effectiveness of FORGE on scientific downstream tasks. Our research establishes best practices that can be applied across various fields to take advantage of LLMs for scientific discovery.
UR - http://www.scopus.com/inward/record.url?scp=85179554975&partnerID=8YFLogxK
U2 - 10.1145/3581784.3613215
DO - 10.1145/3581784.3613215
M3 - Conference contribution
AN - SCOPUS:85179554975
T3 - Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2023
BT - Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2023
PB - Association for Computing Machinery, Inc
Y2 - 12 November 2023 through 17 November 2023
ER -