Opportunities for retrieval and tool augmented large language models in scientific facilities

Michael H. Prince, Henry Chan, Aikaterini Vriza, Tao Zhou, Varuni K. Sastry, Yanqi Luo, Matthew T. Dearing, Ross J. Harder, Rama K. Vasudevan, Mathew J. Cherukara

Research output: Contribution to journalArticlepeer-review

Abstract

Upgrades to advanced scientific user facilities such as next-generation x-ray light sources, nanoscience centers, and neutron facilities are revolutionizing our understanding of materials across the spectrum of the physical sciences, from life sciences to microelectronics. However, these facility and instrument upgrades come with a significant increase in complexity. Driven by more exacting scientific needs, instruments and experiments become more intricate each year. This increased operational complexity makes it ever more challenging for domain scientists to design experiments that effectively leverage the capabilities of and operate on these advanced instruments. Large language models (LLMs) can perform complex information retrieval, assist in knowledge-intensive tasks across applications, and provide guidance on tool usage. Using x-ray light sources, leadership computing, and nanoscience centers as representative examples, we describe preliminary experiments with a Context-Aware Language Model for Science (CALMS) to assist scientists with instrument operations and complex experimentation. With the ability to retrieve relevant information from facility documentation, CALMS can answer simple questions on scientific capabilities and other operational procedures. With the ability to interface with software tools and experimental hardware, CALMS can conversationally operate scientific instruments. By making information more accessible and acting on user needs, LLMs could expand and diversify scientific facilities’ users and accelerate scientific output.

Original languageEnglish
Article number251
Journalnpj Computational Materials
Volume10
Issue number1
DOIs
StatePublished - Dec 2024

Funding

Work performed at the Center for Nanoscale Materials and Advanced Photon Source, both U.S. Department of Energy Office of Science User Facilities, was supported by the U.S. DOE, Office of Basic Energy Sciences, under Contract No. DE-AC02-06CH11357. This research used resources of the Argonne Leadership Computing Facility, a U.S. Department of Energy (DOE) Office of Science user facility at Argonne National Laboratory and is based on research supported by the U.S. DOE Office of Science-Advanced Scientific Computing Research Program, under Contract No. DE-AC02-06CH11357. M.J.C. and H.C. also acknowledge support from the U.S. Department of Energy, Office of Science, Office of Basic Energy Sciences Data, Artificial Intelligence, and Machine Learning at DOE Scientific User Facilities program under Award Number 34532. GOVERNMENT LICENSE: The submitted manuscript has been created by UChicago Argonne, LLC, Operator of Argonne National Laboratory (\u201CArgonne\u201D). Argonne, a U.S. Department of Energy Office of Science laboratory, is operated under Contract No. DE-AC02-06CH11357. The U.S. Government retains for itself, and others acting on its behalf, a paid-up nonexclusive, irrevocable worldwide license in said article to reproduce, prepare derivative works, distribute copies to the public, and perform publicly and display publicly, by or on behalf of the Government. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan. http://energy.gov/downloads/doe-public-access-plan. The scanning probe microscopy research was supported by the Center for Nanophase Materials Sciences (CNMS), which is a US Department of Energy, Office of Science User Facility at Oak Ridge National Laboratory.

FundersFunder number
U.S. DOE Office of Science-Advanced Scientific Computing Research Program
Oak Ridge National Laboratory
Center for Nanophase Materials Sciences
Center for Nanoscale Materials and Advanced Photon Source
U.S. Department of Energy
Argonne National Laboratory
Office of Basic Energy Sciences Data, Artificial Intelligence
Office of Science
Basic Energy SciencesDE-AC02-06CH11357
Basic Energy Sciences
Machine Learning34532

    Fingerprint

    Dive into the research topics of 'Opportunities for retrieval and tool augmented large language models in scientific facilities'. Together they form a unique fingerprint.

    Cite this