TY - GEN
T1 - Modeling I/O Performance Variability Using Conditional Variational Autoencoders
AU - Madireddy, Sandeep
AU - Balaprakash, Prasanna
AU - Carns, Philip
AU - Latham, Robert
AU - Ross, Robert
AU - Snyder, Shane
AU - Wild, Stefan
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/10/29
Y1 - 2018/10/29
N2 - Storage system performance modeling is crucial for efficient use of heterogeneous shared resources on leadership-class computers. Variability in application performance, particularly variability arising from concurrent applications sharing I/O resources, is a major hurdle in the development of accurate performance models. We adopt a deep learning approach based on conditional variational auto encoders (CVAE) for I/O performance modeling, and use it to quantify performance variability. We illustrate our approach using the data collected on Edison, a production supercomputing system at the National Energy Research Scientific Computing Center (NERSC). The CVAE approach is investigated by comparing it to a previously proposed sensitivity-based Gaussian process (GP) model. We find that the CVAE model performs slightly better than the GP model in cases where training and testing data come from different applications, since CVAE can inherently leverage the whole data from multiple applications whereas GP partitions the data and builds separate models for each partition. Hence, the CVAE offers an alternative modeling approach that does not need pre-processing; it has enough flexibility to handle data from a wide variety of applications without changing the inference approach.
AB - Storage system performance modeling is crucial for efficient use of heterogeneous shared resources on leadership-class computers. Variability in application performance, particularly variability arising from concurrent applications sharing I/O resources, is a major hurdle in the development of accurate performance models. We adopt a deep learning approach based on conditional variational auto encoders (CVAE) for I/O performance modeling, and use it to quantify performance variability. We illustrate our approach using the data collected on Edison, a production supercomputing system at the National Energy Research Scientific Computing Center (NERSC). The CVAE approach is investigated by comparing it to a previously proposed sensitivity-based Gaussian process (GP) model. We find that the CVAE model performs slightly better than the GP model in cases where training and testing data come from different applications, since CVAE can inherently leverage the whole data from multiple applications whereas GP partitions the data and builds separate models for each partition. Hence, the CVAE offers an alternative modeling approach that does not need pre-processing; it has enough flexibility to handle data from a wide variety of applications without changing the inference approach.
KW - I/O performance variability
KW - Parallel filesystems
KW - Probabilistic machine learning
KW - Variational autoencoders
UR - http://www.scopus.com/inward/record.url?scp=85057278273&partnerID=8YFLogxK
U2 - 10.1109/CLUSTER.2018.00022
DO - 10.1109/CLUSTER.2018.00022
M3 - Conference contribution
AN - SCOPUS:85057278273
T3 - Proceedings - IEEE International Conference on Cluster Computing, ICCC
SP - 109
EP - 113
BT - Proceedings - 2018 IEEE International Conference on Cluster Computing, CLUSTER 2018
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2018 IEEE International Conference on Cluster Computing, CLUSTER 2018
Y2 - 10 September 2018 through 13 September 2018
ER -