TY - GEN
T1 - Fast Training of Deep Neural Networks for Speech Recognition
AU - Cong, Guojing
AU - Kingsbury, Brian
AU - Yang, Chih Chieh
AU - Liu, Tianyi
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/5
Y1 - 2020/5
N2 - Training large, deep neural network acoustic models for speech recognition on large datasets takes a long time on a single GPU, motivating research on parallel training algorithms. We present an approach for training a bidirectional LSTM acoustic model on the 2000-hour Switchboard corpus. The model we train achieves state-of-the-art word error rate, 7.5% on the Hub5-2000 Switchboard test set and 13.1% on the Callhome test set, and scales to an unprecedented 96 learners while employing only 12 global reductions per epoch of training. As our implementation incurs far fewer reductions than prior work, it does not require aggressively optimized communication primitives to reach state-of-the-art performance in a short amount of time. With 48 NVIDIA V100 GPUs training takes 5 hours; with 96 GPUs, training takes around 3 hours.
AB - Training large, deep neural network acoustic models for speech recognition on large datasets takes a long time on a single GPU, motivating research on parallel training algorithms. We present an approach for training a bidirectional LSTM acoustic model on the 2000-hour Switchboard corpus. The model we train achieves state-of-the-art word error rate, 7.5% on the Hub5-2000 Switchboard test set and 13.1% on the Callhome test set, and scales to an unprecedented 96 learners while employing only 12 global reductions per epoch of training. As our implementation incurs far fewer reductions than prior work, it does not require aggressively optimized communication primitives to reach state-of-the-art performance in a short amount of time. With 48 NVIDIA V100 GPUs training takes 5 hours; with 96 GPUs, training takes around 3 hours.
UR - https://www.scopus.com/pages/publications/85089242150
U2 - 10.1109/ICASSP40776.2020.9053993
DO - 10.1109/ICASSP40776.2020.9053993
M3 - Conference contribution
AN - SCOPUS:85089242150
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 6884
EP - 6888
BT - 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020
Y2 - 4 May 2020 through 8 May 2020
ER -