Evaluation of pre-training large language models on leadership-class supercomputers

Research output: Contribution to journalArticlepeer-review

6 Scopus citations

Abstract

Large language models (LLMs) have arisen rapidly to the center stage of artificial intelligence as the foundation models applicable to many downstream learning tasks. However, how to effectively build, train, and serve such models for many high-stake and first-principle-based scientific use cases are both of great interests and of great challenges. Moreover, pre-training LLMs with billions or even trillions of parameters can be prohibitively expensive not just for academic institutions, but also for well-funded industrial and government labs. Furthermore, the energy cost and the environmental impact of developing LLMs must be kept in mind. In this work, we conduct a first-of-its-kind performance analysis to understand the time and energy cost of pre-training LLMs on the Department of Energy (DOE)’s leadership-class supercomputers. Employing state-of-the-art distributed training techniques, we evaluate the computational performance of various parallelization approaches at scale for a range of model sizes, and establish a projection model for the cost of full training. Our findings provide baseline results, best practices, and heuristics for pre-training such large models that should be valuable to HPC community at large. We also offer insights and optimization strategies for using the first exascale computing system, Frontier, to train models of the size of GPT-3 and beyond.

Original languageEnglish
Pages (from-to)20747-20768
Number of pages22
JournalJournal of Supercomputing
Volume79
Issue number18
DOIs
StatePublished - Dec 2023

Funding

J.Y. would like to thank Quentin Anthony and Stella Biderman from EleutherAI, and Less Wright and Geeta Chauhan from PyTorch team for the valuable discussions. This research was sponsored by and used resources of the Oak Ridge Leadership Computing Facility (OLCF), which is a DOE Office of Science User Facility at the Oak Ridge National Laboratory supported by the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.

FundersFunder number
U.S. Department of EnergyDE-AC05-00OR22725
Office of Science

    Keywords

    • AI foundation model
    • Distributed training
    • Performance analysis

    Fingerprint

    Dive into the research topics of 'Evaluation of pre-training large language models on leadership-class supercomputers'. Together they form a unique fingerprint.

    Cite this