Abstract
Reliable execution of scientific workflows is a fundamental concern in computational campaigns. Therefore, detecting and diagnosing anomalies are both important and challenging for workflow executions that span complex, distributed computing infrastructures. In this paper we model the scientific workflow as a directed acyclic graph and apply graph neural networks (GNNs) to identify the anomalies at both the workflow and individual job levels. In addition, we generalize our GNN model to take into account a set of workflows together for the anomaly detection task rather than a specific workflow. By taking advantage of learning the hidden representation, not only from the job features, but also from the topological information of the workflow, our GNN models demonstrate higher accuracy and better runtime efficiency when compared with conventional machine learning models and other convolutional neural network approaches.
Original language | English |
---|---|
Title of host publication | Proceedings of WORKS 2022 |
Subtitle of host publication | 17th Workshop on Workflows in Support of Large-Scale Science, Held in conjunction with SC 2022: The International Conference for High Performance Computing, Networking, Storage and Analysis |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 35-42 |
Number of pages | 8 |
ISBN (Electronic) | 9781665451918 |
DOIs | |
State | Published - 2022 |
Externally published | Yes |
Event | 17th IEEE/ACM Workshop on Workflows in Support of Large-Scale Science, WORKS 2022 - Dallas, United States Duration: Nov 13 2022 → Nov 18 2022 |
Publication series
Name | Proceedings of WORKS 2022: 17th Workshop on Workflows in Support of Large-Scale Science, Held in conjunction with SC 2022: The International Conference for High Performance Computing, Networking, Storage and Analysis |
---|
Conference
Conference | 17th IEEE/ACM Workshop on Workflows in Support of Large-Scale Science, WORKS 2022 |
---|---|
Country/Territory | United States |
City | Dallas |
Period | 11/13/22 → 11/18/22 |
Funding
This work is funded by the Department of Energy under the Integrated Computational and Data Infrastructure (ICDI) for Scientific Discovery, grant #DE-SC0022328. Experimental data was collected on the ExoGENI testbed supported by NSF. This material is based upon work supported by the U.S. Department of Energy, Office of Science, under contract number DE-AC02-06CH11357.
Keywords
- Anomaly detection
- Graph neural networks
- Scientific workflows