TY - GEN
T1 - Comparative Study of Large Language Model Architectures on Frontier
AU - Yin, Junqi
AU - Bose, Avishek
AU - Cong, Guojing
AU - Lyngaas, Isaac
AU - Anthony, Quentin
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Large language models (LLMs) have garnered significant attention in both the AI community and beyond. Among these, the Generative Pre-trained Transformer (GPT) has emerged as the dominant architecture, spawning numerous variants. However, these variants have undergone pre-training under diverse conditions, including variations in input data, data preprocessing, and training methodologies, resulting in a lack of controlled comparative studies. Here we meticulously examine two prominent open-sourced GPT architectures, GPT-NeoX and LLaMA, leveraging the computational power of Frontier, the world's first Exascale supercomputer. Employing the same materials science text corpus and a comprehensive end-to-end pipeline, we conduct a comparative analysis of their training and downstream performance. Our efforts culminate in achieving state-of-the-art performance on a challenging materials science benchmark. Furthermore, we investigate the computation and energy efficiency, and propose a computationally efficient method for architecture design. To our knowledge, these pre-trained models represent the largest available for materials science. Our findings provide practical guidance for building LLMs on HPC platforms.
AB - Large language models (LLMs) have garnered significant attention in both the AI community and beyond. Among these, the Generative Pre-trained Transformer (GPT) has emerged as the dominant architecture, spawning numerous variants. However, these variants have undergone pre-training under diverse conditions, including variations in input data, data preprocessing, and training methodologies, resulting in a lack of controlled comparative studies. Here we meticulously examine two prominent open-sourced GPT architectures, GPT-NeoX and LLaMA, leveraging the computational power of Frontier, the world's first Exascale supercomputer. Employing the same materials science text corpus and a comprehensive end-to-end pipeline, we conduct a comparative analysis of their training and downstream performance. Our efforts culminate in achieving state-of-the-art performance on a challenging materials science benchmark. Furthermore, we investigate the computation and energy efficiency, and propose a computationally efficient method for architecture design. To our knowledge, these pre-trained models represent the largest available for materials science. Our findings provide practical guidance for building LLMs on HPC platforms.
KW - AI foundation model
KW - GPT architecture
KW - HPC
UR - http://www.scopus.com/inward/record.url?scp=85198901579&partnerID=8YFLogxK
U2 - 10.1109/IPDPS57955.2024.00056
DO - 10.1109/IPDPS57955.2024.00056
M3 - Conference contribution
AN - SCOPUS:85198901579
T3 - Proceedings - 2024 IEEE International Parallel and Distributed Processing Symposium, IPDPS 2024
SP - 556
EP - 569
BT - Proceedings - 2024 IEEE International Parallel and Distributed Processing Symposium, IPDPS 2024
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 38th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2024
Y2 - 27 May 2024 through 31 May 2024
ER -