Adaptive Generation of Training Data for ML Reduced Model Creation

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Machine learning proxy models are often used to speed up or completely replace complex computational models. The greatly reduced and deterministic computational costs enable new use cases such as digital twin control systems and global optimization. The challenge of building these proxy models is generating the training data. A naive uniform sampling of the input space can result in a non-uniform sampling of the output space of a model. This can cause gaps in the training data coverage that can miss finer scale details resulting in poor accuracy. While larger and larger data sets could eventually fill in these gaps, the computational burden of full-scale simulation codes can make this prohibitive. In this paper, we present an adaptive data generation method that utilizes uncertainty estimation to identify regions where training data should be augmented. By targeting data generation to areas of need, representative data sets can be generated efficiently. The effectiveness of this method will be demonstrated on a simple one-dimensional function and a complex multidimensional physics model.

Original languageEnglish
Title of host publicationProceedings - 2022 IEEE International Conference on Big Data, Big Data 2022
EditorsShusaku Tsumoto, Yukio Ohsawa, Lei Chen, Dirk Van den Poel, Xiaohua Hu, Yoichi Motomura, Takuya Takagi, Lingfei Wu, Ying Xie, Akihiro Abe, Vijay Raghavan
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages3408-3416
Number of pages9
ISBN (Electronic)9781665480451
DOIs
StatePublished - 2022
Event2022 IEEE International Conference on Big Data, Big Data 2022 - Osaka, Japan
Duration: Dec 17 2022Dec 20 2022

Publication series

NameProceedings - 2022 IEEE International Conference on Big Data, Big Data 2022

Conference

Conference2022 IEEE International Conference on Big Data, Big Data 2022
Country/TerritoryJapan
CityOsaka
Period12/17/2212/20/22

Bibliographical note

Publisher Copyright:
© 2022 IEEE.

Funding

Notice of Copyright This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan).

FundersFunder number
U.S. Department of Energy

    Keywords

    • component
    • formatting
    • insert
    • style
    • styling

    Fingerprint

    Dive into the research topics of 'Adaptive Generation of Training Data for ML Reduced Model Creation'. Together they form a unique fingerprint.

    Cite this