Abstract
Storage system performance modeling is crucial for efficient use of heterogeneous shared resources on leadership-class computers. Variability in application performance, particularly variability arising from concurrent applications sharing I/O resources, is a major hurdle in the development of accurate performance models. We adopt a deep learning approach based on conditional variational auto encoders (CVAE) for I/O performance modeling, and use it to quantify performance variability. We illustrate our approach using the data collected on Edison, a production supercomputing system at the National Energy Research Scientific Computing Center (NERSC). The CVAE approach is investigated by comparing it to a previously proposed sensitivity-based Gaussian process (GP) model. We find that the CVAE model performs slightly better than the GP model in cases where training and testing data come from different applications, since CVAE can inherently leverage the whole data from multiple applications whereas GP partitions the data and builds separate models for each partition. Hence, the CVAE offers an alternative modeling approach that does not need pre-processing; it has enough flexibility to handle data from a wide variety of applications without changing the inference approach.
Original language | English |
---|---|
Title of host publication | Proceedings - 2018 IEEE International Conference on Cluster Computing, CLUSTER 2018 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 109-113 |
Number of pages | 5 |
ISBN (Electronic) | 9781538683194 |
DOIs | |
State | Published - Oct 29 2018 |
Externally published | Yes |
Event | 2018 IEEE International Conference on Cluster Computing, CLUSTER 2018 - Belfast, United Kingdom Duration: Sep 10 2018 → Sep 13 2018 |
Publication series
Name | Proceedings - IEEE International Conference on Cluster Computing, ICCC |
---|---|
Volume | 2018-September |
ISSN (Print) | 1552-5244 |
Conference
Conference | 2018 IEEE International Conference on Cluster Computing, CLUSTER 2018 |
---|---|
Country/Territory | United Kingdom |
City | Belfast |
Period | 09/10/18 → 09/13/18 |
Funding
This material is based upon work supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, under Contract DE-AC02-06CH11357.
Keywords
- I/O performance variability
- Parallel filesystems
- Probabilistic machine learning
- Variational autoencoders