Comparative Study of Large Language Model Architectures on Frontier

Junqi Yin, Avishek Bose, Guojing Cong, Isaac Lyngaas, Quentin Anthony

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

Large language models (LLMs) have garnered significant attention in both the AI community and beyond. Among these, the Generative Pre-trained Transformer (GPT) has emerged as the dominant architecture, spawning numerous variants. However, these variants have undergone pre-training under diverse conditions, including variations in input data, data preprocessing, and training methodologies, resulting in a lack of controlled comparative studies. Here we meticulously examine two prominent open-sourced GPT architectures, GPT-NeoX and LLaMA, leveraging the computational power of Frontier, the world's first Exascale supercomputer. Employing the same materials science text corpus and a comprehensive end-to-end pipeline, we conduct a comparative analysis of their training and downstream performance. Our efforts culminate in achieving state-of-the-art performance on a challenging materials science benchmark. Furthermore, we investigate the computation and energy efficiency, and propose a computationally efficient method for architecture design. To our knowledge, these pre-trained models represent the largest available for materials science. Our findings provide practical guidance for building LLMs on HPC platforms.

Original languageEnglish
Title of host publicationProceedings - 2024 IEEE International Parallel and Distributed Processing Symposium, IPDPS 2024
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages556-569
Number of pages14
ISBN (Electronic)9798350337662
DOIs
StatePublished - 2024
Event38th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2024 - San Francisco, United States
Duration: May 27 2024May 31 2024

Publication series

NameProceedings - 2024 IEEE International Parallel and Distributed Processing Symposium, IPDPS 2024

Conference

Conference38th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2024
Country/TerritoryUnited States
CitySan Francisco
Period05/27/2405/31/24

Keywords

  • AI foundation model
  • GPT architecture
  • HPC

Fingerprint

Dive into the research topics of 'Comparative Study of Large Language Model Architectures on Frontier'. Together they form a unique fingerprint.

Cite this