TY - GEN
T1 - Low complex & high accuracy computation approximations to enable on-device RNN applications
AU - Pasupuleti, Sirish Kumar
AU - Gadde, Raj Narayana
AU - Rajagopal, Vasanthakumar
AU - Vishnoi, Ashok
AU - Chandra Sekhar, N.
AU - Chandra Kumar, R.
AU - Miniskar, Narasinga Rao
N1 - Publisher Copyright:
© 2019 IEEE
PY - 2019
Y1 - 2019
N2 - Recurrent Neural Networks (RNN) have demonstrated excellent results for various Automatic Speech Recognition (ASR) and Natural Language Processing (NLP) tasks. However, executing RNNs requires huge memory and computations which makes it difficult to achieve real time performance on low power devices like smartphones. Hence, currently ASR and NLP applications such as voice assistants are using cloud based solutions. In this paper, to enable on-device inference, we propose efficient approximations for weights of FC layers and activation functions to reduce the computational complexity. The proposed approximations eliminate multiplications, divisions and exponential operations by replacing them with simple arithmetic operations (shifts, additions) to significantly reduce the computation requirements without any perceivable loss of functional accuracy. The approximations also reduce the memory size and bandwidth requirements. We also present a lightweight VLIW based DSP architecture with these approximations to enable on-device inference. The approximations have been tested on the proposed DSP with various RNN applications like EESEN, LRCN and S2VT. The results with approximations show - accuracies similar to that of float (32-bit) reference, ∼ 8x-12x performance gains, ∼ 2x-4x gains in memory requirement and bandwidth. Moreover, the activation approximation results show better average and peak errors compared to the State of the Art.
AB - Recurrent Neural Networks (RNN) have demonstrated excellent results for various Automatic Speech Recognition (ASR) and Natural Language Processing (NLP) tasks. However, executing RNNs requires huge memory and computations which makes it difficult to achieve real time performance on low power devices like smartphones. Hence, currently ASR and NLP applications such as voice assistants are using cloud based solutions. In this paper, to enable on-device inference, we propose efficient approximations for weights of FC layers and activation functions to reduce the computational complexity. The proposed approximations eliminate multiplications, divisions and exponential operations by replacing them with simple arithmetic operations (shifts, additions) to significantly reduce the computation requirements without any perceivable loss of functional accuracy. The approximations also reduce the memory size and bandwidth requirements. We also present a lightweight VLIW based DSP architecture with these approximations to enable on-device inference. The approximations have been tested on the proposed DSP with various RNN applications like EESEN, LRCN and S2VT. The results with approximations show - accuracies similar to that of float (32-bit) reference, ∼ 8x-12x performance gains, ∼ 2x-4x gains in memory requirement and bandwidth. Moreover, the activation approximation results show better average and peak errors compared to the State of the Art.
KW - On-Device Inference
KW - Sigmoid TanH piece-wise approximations
KW - Weights approximations as shifts
UR - http://www.scopus.com/inward/record.url?scp=85066815953&partnerID=8YFLogxK
U2 - 10.1109/ISCAS.2019.8702528
DO - 10.1109/ISCAS.2019.8702528
M3 - Conference contribution
AN - SCOPUS:85066815953
T3 - Proceedings - IEEE International Symposium on Circuits and Systems
BT - 2019 IEEE International Symposium on Circuits and Systems, ISCAS 2019 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2019 IEEE International Symposium on Circuits and Systems, ISCAS 2019
Y2 - 26 May 2019 through 29 May 2019
ER -