Scientific machine learning benchmarks

Jeyan Thiyagalingam, Mallikarjun Shankar, Geoffrey Fox, Tony Hey

Research output: Contribution to journalReview articlepeer-review

64 Scopus citations

Abstract

Deep learning has transformed the use of machine learning technologies for the analysis of large experimental datasets. In science, such datasets are typically generated by large-scale experimental facilities, and machine learning focuses on the identification of patterns, trends and anomalies to extract meaningful scientific insights from the data. In upcoming experimental facilities, such as the Extreme Photonics Application Centre (EPAC) in the UK or the international Square Kilometre Array (SKA), the rate of data generation and the scale of data volumes will increasingly require the use of more automated data analysis. However, at present, identifying the most appropriate machine learning algorithm for the analysis of any given scientific dataset is a challenge due to the potential applicability of many different machine learning frameworks, computer architectures and machine learning models. Historically, for modelling and simulation on high-performance computing systems, these issues have been addressed through benchmarking computer applications, algorithms and architectures. Extending such a benchmarking approach and identifying metrics for the application of machine learning methods to open, curated scientific datasets is a new challenge for both scientists and computer scientists. Here, we introduce the concept of machine learning benchmarks for science and review existing approaches. As an example, we describe the SciMLBench suite of scientific machine learning benchmarks.

Original languageEnglish
Pages (from-to)413-420
Number of pages8
JournalNature Reviews Physics
Volume4
Issue number6
DOIs
StatePublished - Jun 2022

Funding

We would like to thank Samuel Jackson, Kuangdai Leng, Keith Butler and Juri Papay from the Scientific Machine Learning Group at the Rutherford Appleton Laboratory, Junqi Yin and Aristeidis Tsaris from Oak Ridge National Laboratory and the MLCommons Science Working Group for valuable discussions. This work was supported by Wave 1 of the UKRI Strategic Priorities Fund under the EPSRC grant EP/T001569/1, particularly the ‘AI for Science’ theme within that grant, by the Alan Turing Institute and by the Benchmarking for AI for Science at Exascale (BASE) project under the EPSRC grant EP/V001310/1. This research also used resources from the Oak Ridge Leadership Computing Facility, which is a DOE Office of Science user facility supported under contract DE-AC05-00OR22725 and from the Science and Technology Facilities Council, particularly that of the Pearl AI resource. We would like to thank Samuel Jackson, Kuangdai Leng, Keith Butler and Juri Papay from the Scientific Machine Learning Group at the Rutherford Appleton Laboratory, Junqi Yin and Aristeidis Tsaris from Oak Ridge National Laboratory and the MLCommons Science Working Group for valuable discussions. This work was supported by Wave 1 of the UKRI Strategic Priorities Fund under the EPSRC grant EP/T001569/1, particularly the ‘AI for Science’ theme within that grant, by the Alan Turing Institute and by the Benchmarking for AI for Science at Exascale (BASE) project under the EPSRC grant EP/V001310/1. This research also used resources from the Oak Ridge Leadership Computing Facility, which is a DOE Office of Science user facility supported under contract DE-AC05-00OR22725 and from the Science and Technology Facilities Council, particularly that of the Pearl AI resource.

FundersFunder number
Pearl AI
Scientific Machine Learning Group
Office of ScienceDE-AC05-00OR22725
Oak Ridge National Laboratory
Alan Turing InstituteEP/V001310/1
UK Research and Innovation
Engineering and Physical Sciences Research CouncilEP/T001569/1
Science and Technology Facilities Council

    Fingerprint

    Dive into the research topics of 'Scientific machine learning benchmarks'. Together they form a unique fingerprint.

    Cite this