TY - GEN
T1 - Scaling point set registration in 3D across thread counts on multicore and hardware accelerator platforms through autotuning for large scale analysis of scientific point clouds
AU - Luszczek, Piotr
AU - Kurzak, Jakub
AU - Yamazaki, Ichitaro
AU - Keffer, David
AU - Dongarra, Jack
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2017/7/1
Y1 - 2017/7/1
N2 - In this article, we present an autotuning approach applied to systematic performance engineering of the EM-ICP (Expectation-Maximization Iterative Closest Point) algorithm for the point set registration problem. We show how we were able to exceed the performance achieved by the reference code through multiple dependence transformations and automated procedure of generating and evaluating numerous implementation variants. Furthermore, we also managed to exploit code transformations that are not that common during manual optimization but yielded better performance in our tests for the EM-ICP algorithm. Finally, we maintained high levels of performance rate in a portable fashion across a wide range of HPC hardware platforms including multicore, many-core, and GPU-based accelerators. More importantly, the results indicate consistently high performance level and ability to move the task of data analysis through point-set registration to any modern compute platform without the concern of inferior asymptotic efficiency.
AB - In this article, we present an autotuning approach applied to systematic performance engineering of the EM-ICP (Expectation-Maximization Iterative Closest Point) algorithm for the point set registration problem. We show how we were able to exceed the performance achieved by the reference code through multiple dependence transformations and automated procedure of generating and evaluating numerous implementation variants. Furthermore, we also managed to exploit code transformations that are not that common during manual optimization but yielded better performance in our tests for the EM-ICP algorithm. Finally, we maintained high levels of performance rate in a portable fashion across a wide range of HPC hardware platforms including multicore, many-core, and GPU-based accelerators. More importantly, the results indicate consistently high performance level and ability to move the task of data analysis through point-set registration to any modern compute platform without the concern of inferior asymptotic efficiency.
KW - Autotuning with code generation
KW - Point set registration
KW - Portable performance engineering
UR - http://www.scopus.com/inward/record.url?scp=85047826990&partnerID=8YFLogxK
U2 - 10.1109/BigData.2017.8258258
DO - 10.1109/BigData.2017.8258258
M3 - Conference contribution
AN - SCOPUS:85047826990
T3 - Proceedings - 2017 IEEE International Conference on Big Data, Big Data 2017
SP - 2893
EP - 2902
BT - Proceedings - 2017 IEEE International Conference on Big Data, Big Data 2017
A2 - Nie, Jian-Yun
A2 - Obradovic, Zoran
A2 - Suzumura, Toyotaro
A2 - Ghosh, Rumi
A2 - Nambiar, Raghunath
A2 - Wang, Chonggang
A2 - Zang, Hui
A2 - Baeza-Yates, Ricardo
A2 - Baeza-Yates, Ricardo
A2 - Hu, Xiaohua
A2 - Kepner, Jeremy
A2 - Cuzzocrea, Alfredo
A2 - Tang, Jian
A2 - Toyoda, Masashi
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 5th IEEE International Conference on Big Data, Big Data 2017
Y2 - 11 December 2017 through 14 December 2017
ER -