TY - GEN
T1 - Performance of Floating-point Intensive Kernels on Low-power Processor-A Case Study with Geodesic Distance Kernel
AU - Jin, Zheming
AU - Velesko, Paulius
AU - Finkel, Hal
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/10
Y1 - 2019/10
N2 - A processor, with a GPU and a CPU integrated on the same chip, is a promising low-power system for floating-point intensive applications. While an integrated GPU is not designed to outperform a discrete GPU due to its power, area, and thermal constraints, there is a need to better understand the performance of a floating-point intensive kernel using an integrated GPU. Toward this end, we choose a representative floating-point intensive kernel as a case study. We port the kernel with a vendor-neutral framework, analyze the compiler optimizations of the kernel at the assembly code, evaluate the relationship between floating-point operations per second and arithmetic intensity, and compare the performance and power of the kernel implementations on the CPU and GPU. Our key findings are: 1) Compared to an un-optimized kernel, the floating-point optimizations improve the performance of the single-and double-precision floating-point kernels executing on an Intel® GEN8 Iris Pro GPU by 15.4X and 5.4X, respectively; the optimizations also improve the performance of the two kernels by 5.6X and 3.4X on an Intel® Xeon® E3 CPU, respectively. 2) Achieving peak floating-point operations per second on the GPU requires much higher arithmetic intensity than that on the CPU. 3) Running the floating-point intensive kernel on the processor consumes 48 Watts, which is very close to the thermal power draw of the processor. The floating-point optimization can reduce the average GPU power from 35.7 W to 22.7 W for the double-precision kernel, and from 33.1 W to 8.8 W for the single-precision kernel.
AB - A processor, with a GPU and a CPU integrated on the same chip, is a promising low-power system for floating-point intensive applications. While an integrated GPU is not designed to outperform a discrete GPU due to its power, area, and thermal constraints, there is a need to better understand the performance of a floating-point intensive kernel using an integrated GPU. Toward this end, we choose a representative floating-point intensive kernel as a case study. We port the kernel with a vendor-neutral framework, analyze the compiler optimizations of the kernel at the assembly code, evaluate the relationship between floating-point operations per second and arithmetic intensity, and compare the performance and power of the kernel implementations on the CPU and GPU. Our key findings are: 1) Compared to an un-optimized kernel, the floating-point optimizations improve the performance of the single-and double-precision floating-point kernels executing on an Intel® GEN8 Iris Pro GPU by 15.4X and 5.4X, respectively; the optimizations also improve the performance of the two kernels by 5.6X and 3.4X on an Intel® Xeon® E3 CPU, respectively. 2) Achieving peak floating-point operations per second on the GPU requires much higher arithmetic intensity than that on the CPU. 3) Running the floating-point intensive kernel on the processor consumes 48 Watts, which is very close to the thermal power draw of the processor. The floating-point optimization can reduce the average GPU power from 35.7 W to 22.7 W for the double-precision kernel, and from 33.1 W to 8.8 W for the single-precision kernel.
KW - GFLOPS
KW - Integrated GPU
KW - OpenCL
KW - floating-point intensive
UR - http://www.scopus.com/inward/record.url?scp=85079271884&partnerID=8YFLogxK
U2 - 10.1109/IGSC48788.2019.8957171
DO - 10.1109/IGSC48788.2019.8957171
M3 - Conference contribution
AN - SCOPUS:85079271884
T3 - 2019 10th International Green and Sustainable Computing Conference, IGSC 2019
BT - 2019 10th International Green and Sustainable Computing Conference, IGSC 2019
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 10th International Green and Sustainable Computing Conference, IGSC 2019
Y2 - 21 October 2019 through 24 October 2019
ER -