TY - GEN
T1 - Asynchronous Decentralized Bayesian Optimization for Large Scale Hyperparameter Optimization
AU - Egelé, Romain
AU - Guyon, Isabelle
AU - Vishwanath, Venkatram
AU - Balaprakash, Prasanna
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Bayesian optimization (BO) is a promising approach for hyperparameter optimization of deep neural networks (DNNs), where each model training can take minutes to hours. In BO, a computationally cheap surrogate model is employed to learn the relationship between parameter configurations and their performance such as accuracy. Parallel BO methods often adopt single manager/multiple workers strategies to evaluate multiple hyperparameter configurations simultaneously. Despite significant hyperparameter evaluation time, the overhead in such centralized schemes prevents these methods to scale on a large number of workers. We present an asynchronous-decentralized BO, wherein each worker runs a sequential BO and asynchronously communicates its results through shared storage. We scale our method without loss of computational efficiency with above 95% of worker's utilization to 1,920 parallel workers (full production queue of the Polaris supercomputer) and demonstrate improvement in model accuracy as well as faster convergence on the CANDLE benchmark from the Exascale computing project.
AB - Bayesian optimization (BO) is a promising approach for hyperparameter optimization of deep neural networks (DNNs), where each model training can take minutes to hours. In BO, a computationally cheap surrogate model is employed to learn the relationship between parameter configurations and their performance such as accuracy. Parallel BO methods often adopt single manager/multiple workers strategies to evaluate multiple hyperparameter configurations simultaneously. Despite significant hyperparameter evaluation time, the overhead in such centralized schemes prevents these methods to scale on a large number of workers. We present an asynchronous-decentralized BO, wherein each worker runs a sequential BO and asynchronously communicates its results through shared storage. We scale our method without loss of computational efficiency with above 95% of worker's utilization to 1,920 parallel workers (full production queue of the Polaris supercomputer) and demonstrate improvement in model accuracy as well as faster convergence on the CANDLE benchmark from the Exascale computing project.
KW - Bayesian optimization
KW - asynchronous parallel computing
KW - hyperparameter optimization
KW - machine learning
UR - http://www.scopus.com/inward/record.url?scp=85174236986&partnerID=8YFLogxK
U2 - 10.1109/e-Science58273.2023.10254839
DO - 10.1109/e-Science58273.2023.10254839
M3 - Conference contribution
AN - SCOPUS:85174236986
T3 - Proceedings 2023 IEEE 19th International Conference on e-Science, e-Science 2023
BT - Proceedings 2023 IEEE 19th International Conference on e-Science, e-Science 2023
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 19th IEEE International Conference on e-Science, e-Science 2023
Y2 - 9 October 2023 through 14 October 2023
ER -