Abstract
Large-scale international scientific collaborations, such as ATLAS, Belle II, CMS, and DUNE, generate vast volumes of data. These experiments necessitate substantial computational power for varied tasks, including structured data processing, Monte Carlo simulations, and end-user analysis. Centralized workflow and data management systems are employed to handle these demands, but current decision-making processes for data placement and payload allocation are often heuristic and disjointed. This optimization challenge potentially could be addressed using contemporary machine learning methods, such as reinforcement learning, which, in turn, require access to extensive data and an interactive environment. Instead, we propose a generative surrogate modeling approach to address the lack of training data and concerns about privacy preservation. We have collected and processed real-world job submission records, totaling more than two million jobs through 150 days, and applied four generative models for tabular data - TVAE, CTAGGAN+, SMOTE, and TabDDPM - to these datasets, thoroughly evaluating their performance. Along with measuring the discrepancy among feature-wise distributions separately, we also evaluate pair-wise feature correlations, distance to closest record, and responses to pre-trained models. Our experiments indicate that SMOTE and TabDDPM can generate similar tabular data, almost indistinguishable from the ground truth. Yet, as a non-learning method, SMOTE ranks the lowest in privacy preservation. As a result, we conclude that the probabilistic-diffusion-model-based TabDDPM is the most suitable generative model for managing job record data.
Original language | English |
---|---|
Title of host publication | Proceedings of SC 2024-W |
Subtitle of host publication | Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 79-86 |
Number of pages | 8 |
ISBN (Electronic) | 9798350355543 |
DOIs | |
State | Published - 2024 |
Event | 2024 Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC Workshops 2024 - Atlanta, United States Duration: Nov 17 2024 → Nov 22 2024 |
Publication series
Name | Proceedings of SC 2024-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis |
---|
Conference
Conference | 2024 Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC Workshops 2024 |
---|---|
Country/Territory | United States |
City | Atlanta |
Period | 11/17/24 → 11/22/24 |
Funding
This material is based on work supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research under Award Number DESC- 0012704.
Keywords
- AI-based Performance Modeling
- Distributed Workflows
- High-Performance Computing
- Simulation
- Surrogate Models