TY - GEN
T1 - Accurate and Efficient Fixed Point Inference for Deep Neural Networks
AU - Rajagopal, Vasanthakumar
AU - Ramasamy, Chandra Kumar
AU - Vishnoi, Ashok
AU - Gadde, Raj Narayana
AU - Miniskar, Narasinga Rao
AU - Pasupuleti, Sirish Kumar
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/8/29
Y1 - 2018/8/29
N2 - Deploying DNNs on embedded devices is a challenge because of their high memory and computational requirements. Performing DNN inference in lesser bit-width fixed point arithmetic is seen as a crucial step in realizing DNNs on embedded devices. State-of-the-art methods achieve floating point accuracy using re-training and complex activation normalization methods. In this paper we propose an accurate and efficient end-to-end DNN inference on 16-bit fixed point arithmetic. We prove that floating point accuracy can be achieved with a simple quantization method of using powers of 2 as scale factors coupled with our optimal bit-width estimation algorithm without using re-training. Additionally, it leads to efficient activation normalization using only arithmetic shifts. We show that the combination of our quantization method and activation normalization maximizes SIMD throughput resulting in 2x to 6x gain in execution time compared to floating point inference. Experimental results demonstrate that our method generalizes to different networks giving same or better accuracy compared to floating point for classification, regression and recurrent networks.
AB - Deploying DNNs on embedded devices is a challenge because of their high memory and computational requirements. Performing DNN inference in lesser bit-width fixed point arithmetic is seen as a crucial step in realizing DNNs on embedded devices. State-of-the-art methods achieve floating point accuracy using re-training and complex activation normalization methods. In this paper we propose an accurate and efficient end-to-end DNN inference on 16-bit fixed point arithmetic. We prove that floating point accuracy can be achieved with a simple quantization method of using powers of 2 as scale factors coupled with our optimal bit-width estimation algorithm without using re-training. Additionally, it leads to efficient activation normalization using only arithmetic shifts. We show that the combination of our quantization method and activation normalization maximizes SIMD throughput resulting in 2x to 6x gain in execution time compared to floating point inference. Experimental results demonstrate that our method generalizes to different networks giving same or better accuracy compared to floating point for classification, regression and recurrent networks.
KW - Deep Neural Networks
KW - Fixed Point Arithmetic
KW - Inference
KW - Quantization
UR - http://www.scopus.com/inward/record.url?scp=85062923672&partnerID=8YFLogxK
U2 - 10.1109/ICIP.2018.8451268
DO - 10.1109/ICIP.2018.8451268
M3 - Conference contribution
AN - SCOPUS:85062923672
T3 - Proceedings - International Conference on Image Processing, ICIP
SP - 1847
EP - 1851
BT - 2018 IEEE International Conference on Image Processing, ICIP 2018 - Proceedings
PB - IEEE Computer Society
T2 - 25th IEEE International Conference on Image Processing, ICIP 2018
Y2 - 7 October 2018 through 10 October 2018
ER -