chatHPC: Empowering HPC users with large language models

Junqi Yin, Jesse Hines, Emily Herron, Tirthankar Ghosal, Hong Liu, Suzanne Prentice, Vanessa Lama, Feiyi Wang

Research output: Contribution to journalArticlepeer-review

Abstract

The ever-growing number of pre-trained large language models (LLMs) across scientific domains presents a challenge for application developers. While these models offer vast potential, fine-tuning them with custom data, aligning them for specific tasks, and evaluating their performance remain crucial steps for effective utilization. However, applying these techniques to models with tens of billions of parameters can take days or even weeks on modern workstations, making the cumulative cost of model comparison and evaluation a significant barrier to LLM-based application development. To address this challenge, we introduce an end-to-end pipeline specifically designed for building conversational and programmable AI agents on high performance computing (HPC) platforms. Our comprehensive pipeline encompasses: model pre-training, fine-tuning, web and API service deployment, along with crucial evaluations for lexical coherence, semantic accuracy, hallucination detection, and privacy considerations. We demonstrate our pipeline through the development of chatHPC, a chatbot for HPC question answering and script generation. Leveraging our scalable pipeline, we achieve end-to-end LLM alignment in under an hour on the Frontier supercomputer. We propose a novel self-improved, self-instruction method for instruction set generation, investigate scaling and fine-tuning strategies, and conduct a systematic evaluation of model performance. The established practices within chatHPC will serve as a valuable guidance for future LLM-based application development on HPC platforms.

Original languageEnglish
Article number194
JournalJournal of Supercomputing
Volume81
Issue number1
DOIs
StatePublished - Jan 2025

Funding

This research was sponsored by and used resources of the Oak Ridge Leadership Computing Facility (OLCF), which is a DOE Office of Science User Facility at the Oak Ridge National Laboratory supported by the US Department of Energy under Contract No. DE-AC05-00OR22725.

FundersFunder number
Office of Science
U.S. Department of EnergyDE-AC05-00OR22725
U.S. Department of Energy

    Keywords

    • HPC-to-LLM-agent
    • High Performance Computing (HPC)
    • LLM Alignment on HPC
    • Large Language Model (LLM)

    Fingerprint

    Dive into the research topics of 'chatHPC: Empowering HPC users with large language models'. Together they form a unique fingerprint.

    Cite this