Training data selection for event classification in a highly variable environment

Anand Iyer, Garrison Flynn, Nidhi Parikh, Daniel Archer, Thomas Karnowski, Monica Maceira, Omar Marcillo, Andrew Nicholson, Will Ray, Randall Wetherington, Michael Willis

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

A problem of interest for nuclear nonproliferation is monitoring activities at nuclear facilities, where proliferation events may only take place a few times and often under variable conditions. Machine learning has revolutionized data analytics by enabling the use of measurable signatures to generate predictive models of facility operations. However, traditional methods for training these models require large, reliable data sets with labeled observations, a challenge for nonproliferation. Highly variable conditions further complicate this as events from training data may have occurred in conditions quite different from the event of interest. Our hypothesis is that when events occur in a highly variable environment, careful training data selection for each test event could outperform the standard approach of using all available training data. We developed a method to optimize training data selection for the given test event and applied it to predicting the power level of the High Flux Isotope Reactor (HFIR) at Oak Ridge National Laboratory. In this study, the reactor startup exhibits variability between occurrences due to natural variability in environmental conditions and operational procedures. Using a combination of analysis techniques, a similitude assessment was performed on data collected from HFIR to isolate clusters that were optimal for training a predictive model. Concepts such as dynamic time warping and Jaccard similarity were used in conjunction with clustering analysis. In order to validate this approach, the model was trained on every combination of unique training events and the predictive performance was compared to the performance using a subset of the training data selected by isolated clusters found through the similitude assessment.

Original languageEnglish
Title of host publicationArtificial Intelligence and Machine Learning for Multi-Domain Operations Applications IV
EditorsTien Pham, Latasha Solomon
PublisherSPIE
ISBN (Electronic)9781510651029
DOIs
StatePublished - 2022
EventArtificial Intelligence and Machine Learning for Multi-Domain Operations Applications IV 2022 - Virtual, Online
Duration: Jun 6 2022Jun 12 2022

Publication series

NameProceedings of SPIE - The International Society for Optical Engineering
Volume12113
ISSN (Print)0277-786X
ISSN (Electronic)1996-756X

Conference

ConferenceArtificial Intelligence and Machine Learning for Multi-Domain Operations Applications IV 2022
CityVirtual, Online
Period06/6/2206/12/22

Keywords

  • Jaccard
  • dynamic time warping
  • k-means
  • similitude
  • supervised learning
  • unsupervised learning

Fingerprint

Dive into the research topics of 'Training data selection for event classification in a highly variable environment'. Together they form a unique fingerprint.

Cite this