Achieving GPT-4o level performance in astronomy with a specialized 8B-parameter large language model

Tijmen de Haan, Yuan Sen Ting, Tirthankar Ghosal, Tuan Dung Nguyen, Alberto Accomazzi, Azton Wells, Nesar Ramachandra, Rui Pan, Zechang Sun

Research output: Contribution to journalArticlepeer-review

2 Scopus citations

Abstract

AstroSage-Llama-3.1-8B is a domain-specialized natural-language AI assistant tailored for research in astronomy, astrophysics, cosmology, and astronomical instrumentation. Trained on the complete collection of astronomy-related arXiv papers from 2007 to 2024 along with millions of synthetically-generated question-answer pairs and other astronomical literature, AstroSage-Llama-3.1-8B demonstrates remarkable proficiency on a wide range of questions. AstroSage-Llama-3.1-8B scores 80.9% on the AstroMLab-1 benchmark, greatly outperforming all models—proprietary and open-weight—in the 8-billion parameter class, and performing on par with GPT-4o. This achievement demonstrates the potential of domain specialization in AI, suggesting that focused training can yield capabilities exceeding those of much larger, general-purpose models. AstroSage-Llama-3.1-8B is freely available, enabling widespread access to advanced AI capabilities for astronomical education and research.

Original languageEnglish
Article number13751
JournalScientific Reports
Volume15
Issue number1
DOIs
StatePublished - Dec 2025

Funding

This research used resources of the Oak Ridge Leadership Computing Facility (OLCF), which is a DOE Office of Science User Facility at the Oak Ridge National Laboratory supported by the U.S. Department of Energy under Contract No. DE-AC05-00OR22725 and support from Microsoft’s Accelerating Foundation Models Research (AFMR) program. TdH was supported by World Premier International Research Center Initiative (WPI), MEXT, Japan. YST is supported by the National Science Foundation under Grant No. 2406729. Work at Argonne National Lab is supported by UChicago Argonne LLC, Operator of Argonne National Laboratory. Argonne, a U.S. Department of Energy Office of Science Laboratory, is operated under contract no. DE-AC02-06CH11357. A special thanks goes out to Cassie Reuter and Joshua Montgomery for acting as independent evaluators.

Keywords

  • AI assistant
  • Continued pretraining
  • Large-language model
  • Supervised fine-tuning

Fingerprint

Dive into the research topics of 'Achieving GPT-4o level performance in astronomy with a specialized 8B-parameter large language model'. Together they form a unique fingerprint.

Cite this