Skip to main navigation Skip to search Skip to main content

An Active Learning-Based Streaming Pipeline for Reduced Data Training of Structure Finding Models in Neutron Diffractometry

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Structure determination workloads in neutron diffractometry are computationally expensive and routinely require several hours to many days to determine the structure of a material from its neutron diffraction patterns. The potential for machine learning models trained on simulated neutron scattering patterns to significantly speed up these tasks have been reported recently. However, the amount of simulated data needed to train these models grows exponentially with the number of structural parameters to be predicted and poses a significant computational challenge. To overcome this challenge, we introduce a novel batch-mode active learning (AL) policy that uses uncertainty sampling to simulate training data drawn from a probability distribution that prefers labelled examples about which the model is least certain. We confirm its efficacy in training the same models with ∼ 75% less training data while improving the accuracy. We then discuss the design of an efficient stream-based training workflow that uses this AL policy and present a performance study on two heterogeneous platforms to demonstrate that, compared with a conventional training workflow, the streaming workflow delivers ∼ 20% shorter training time without any loss of accuracy.

Original languageEnglish
Title of host publicationProceedings - 2024 IEEE International Conference on Big Data, BigData 2024
EditorsWei Ding, Chang-Tien Lu, Fusheng Wang, Liping Di, Kesheng Wu, Jun Huan, Raghu Nambiar, Jundong Li, Filip Ilievski, Ricardo Baeza-Yates, Xiaohua Hu
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1873-1882
Number of pages10
ISBN (Electronic)9798350362480
DOIs
StatePublished - 2024
Event2024 IEEE International Conference on Big Data, BigData 2024 - Washington, United States
Duration: Dec 15 2024Dec 18 2024

Publication series

NameProceedings - 2024 IEEE International Conference on Big Data, BigData 2024
ISSN (Print)2639-1589
ISSN (Electronic)2573-2978

Conference

Conference2024 IEEE International Conference on Big Data, BigData 2024
Country/TerritoryUnited States
CityWashington
Period12/15/2412/18/24

Funding

This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-publicaccess-plan) This research used resources at the Argonne Leadership Computing Facility, the National Energy Research Scientific Computing Center and the Spallation Neutron Source, which are DOE Office of Science User Facilities as well as in the Oak Ridge National Laboratory and the Brookhaven National Laboratory which are DOE Office of Science National Laboratories. This research was sponsored by the ExaLearn Co-Design Project, an Exascale Computing Project, DOE.

Fingerprint

Dive into the research topics of 'An Active Learning-Based Streaming Pipeline for Reduced Data Training of Structure Finding Models in Neutron Diffractometry'. Together they form a unique fingerprint.

Cite this