SciTrust: Evaluating the Trustworthiness of Large Language Models for Science

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

This work presents SciTrust, a comprehensive framework for assessing the trustworthiness of large language models (LLMs) in scientific contexts, with a focus on truthfulness, accuracy, hallucination, and sycophancy. The framework introduces four novel open-ended benchmarks in Computer Science, Chemistry, Biology, and Physics, and employs a multi-faceted evaluation approach combining traditional metrics with LLMbased evaluation. SciTrust was applied to five LLMs, including one general-purpose and four scientific models, revealing nuanced strengths and weaknesses across different models and benchmarks. The study also evaluated SciTrust's performance and scalability on high-performance computing systems. Results showed varying performance across models, with Llama3-70B-Instruct performing strongly overall, while Galactica-120B and SciGLM-6B excelled among scientific models. SciTrust aims to advance the development of trustworthy AI in scientific applications and establish a foundation for future research on model robustness, safety, and ethics in scientific contexts. We have open-sourced our framework, including all associated scripts and datasets, at https://github.com/herronej/SciTrust.

Original languageEnglish
Title of host publicationProceedings of SC 2024-W
Subtitle of host publicationWorkshops of the International Conference for High Performance Computing, Networking, Storage and Analysis
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages72-78
Number of pages7
ISBN (Electronic)9798350355543
DOIs
StatePublished - 2024
Event2024 Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC Workshops 2024 - Atlanta, United States
Duration: Nov 17 2024Nov 22 2024

Publication series

NameProceedings of SC 2024-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis

Conference

Conference2024 Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC Workshops 2024
Country/TerritoryUnited States
CityAtlanta
Period11/17/2411/22/24

Keywords

  • High Performance Computing
  • Large Language Models for Science
  • Trustworthy AI

Fingerprint

Dive into the research topics of 'SciTrust: Evaluating the Trustworthiness of Large Language Models for Science'. Together they form a unique fingerprint.

Cite this