TY - GEN
T1 - Data federation challenges in remote near-real-time fusion experiment data processing
AU - Choi, Jong
AU - Wang, Ruonan
AU - Churchill, R. Michael
AU - Kube, Ralph
AU - Choi, Minjun
AU - Park, Jinseop
AU - Logan, Jeremy
AU - Mehta, Kshitij
AU - Eisenhauer, Greg
AU - Podhorszki, Norbert
AU - Wolf, Matthew
AU - Chang, C. S.
AU - Klasky, Scott
N1 - Publisher Copyright:
© Springer Nature Switzerland AG 2020.
PY - 2021
Y1 - 2021
N2 - Fusion energy experiments and simulations provide critical information needed to plan future fusion reactors. As next-generation devices like ITER move toward long-pulse experiments, analyses, including AI and ML, should be performed in a wide range of time and computing constraints, from near-real-time constraints, between-shot analysis, and to campaign-wide long-term analysis. However, the data volume, velocity, and variety make it extremely challenging for analyses using only local computational resources. Researchers need the ability to compose and execute workflows spanning edge resources to large-scale highperformance computing facilities. We present Delta, a system to address data analysis challenges, including AI/ML, in fusion science, by leveraging the ADIOS I/O library and middleware, to support executing science workflows over the wide area network for near-real-time streaming. We discuss the data federation challenges in performing remote workflows, focusing on on-going research work in (1) managing, reducing, and streaming data to minimize I/O and data movement overheads, (2) decompressing and reorganizing data for analysis, and (3) executing workflows for automated data analysis. We introduce examples for deep-learning based data analysis for the fusion domain and demonstrate how we use Delta to construct end-to-end workflows for a fusion device in Korea, connecting a remote DOE facility in the USA. The capability demonstrated by this project is the basis for improving the state of the art for near-real-time data federation amongst remote facilities.
AB - Fusion energy experiments and simulations provide critical information needed to plan future fusion reactors. As next-generation devices like ITER move toward long-pulse experiments, analyses, including AI and ML, should be performed in a wide range of time and computing constraints, from near-real-time constraints, between-shot analysis, and to campaign-wide long-term analysis. However, the data volume, velocity, and variety make it extremely challenging for analyses using only local computational resources. Researchers need the ability to compose and execute workflows spanning edge resources to large-scale highperformance computing facilities. We present Delta, a system to address data analysis challenges, including AI/ML, in fusion science, by leveraging the ADIOS I/O library and middleware, to support executing science workflows over the wide area network for near-real-time streaming. We discuss the data federation challenges in performing remote workflows, focusing on on-going research work in (1) managing, reducing, and streaming data to minimize I/O and data movement overheads, (2) decompressing and reorganizing data for analysis, and (3) executing workflows for automated data analysis. We introduce examples for deep-learning based data analysis for the fusion domain and demonstrate how we use Delta to construct end-to-end workflows for a fusion device in Korea, connecting a remote DOE facility in the USA. The capability demonstrated by this project is the basis for improving the state of the art for near-real-time data federation amongst remote facilities.
KW - Data federation
KW - Data streams
KW - Fusion
KW - Remote data analysis
UR - http://www.scopus.com/inward/record.url?scp=85106110804&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-63393-6_19
DO - 10.1007/978-3-030-63393-6_19
M3 - Conference contribution
AN - SCOPUS:85106110804
SN - 9783030633929
T3 - Communications in Computer and Information Science
SP - 285
EP - 299
BT - Driving Scientific and Engineering Discoveries Through the Convergence of HPC, Big Data and AI - 17th Smoky Mountains Computational Sciences and Engineering Conference, SMC 2020, Revised Selected Papers
A2 - Nichols, Jeffrey
A2 - Maccabe, Arthur ‘Barney’
A2 - Parete-Koon, Suzanne
A2 - Verastegui, Becky
A2 - Hernandez, Oscar
A2 - Ahearn, Theresa
PB - Springer Science and Business Media Deutschland GmbH
T2 - 17th Smoky Mountains Computational Sciences and Engineering Conference, SMC 2020
Y2 - 26 August 2020 through 28 August 2020
ER -