TY - GEN
T1 - Opencl kernel vectorization on the cpu, GPU, and FPGA
T2 - 27th Annual IEEE International Symposium on Field-Programmable Custom Computing Machines, FCCM 2019
AU - Jin, Zheming
AU - Finkel, Hal
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/4
Y1 - 2019/4
N2 - OpenCL promotes code portability, and natively supports vectorized data types, which allows developers to potentially take advantage of the single-instruction-multiple-data instructions on CPUs, GPUs, and FPGAs. FPGAs are becoming a promising heterogeneous computing component. In our study, we choose a kernel used in frequent pattern compression as a case study of OpenCL kernel vectorizations on the three computing platforms. We describe different pattern matching approaches for the kernel, and manually vectorize the OpenCL kernel by a factor ranging from 2 to 16. We evaluate the kernel on an Intel Xeon 16-core CPU, an NVIDIA P100 GPU, and a Nallatech 385A FPGA card featuring an Intel Arria 10 GX1150 FPGA. Compared to the optimized kernel that is not vectorized, our vectorization can improve the kernel performance by a factor of 16 on the FPGA. The performance improvement ranges from 1 to 11.4 on the CPU, and from 1.02 to 9.3 on the GPU. The effectiveness of kernel vectorization depends on the work-group size.
AB - OpenCL promotes code portability, and natively supports vectorized data types, which allows developers to potentially take advantage of the single-instruction-multiple-data instructions on CPUs, GPUs, and FPGAs. FPGAs are becoming a promising heterogeneous computing component. In our study, we choose a kernel used in frequent pattern compression as a case study of OpenCL kernel vectorizations on the three computing platforms. We describe different pattern matching approaches for the kernel, and manually vectorize the OpenCL kernel by a factor ranging from 2 to 16. We evaluate the kernel on an Intel Xeon 16-core CPU, an NVIDIA P100 GPU, and a Nallatech 385A FPGA card featuring an Intel Arria 10 GX1150 FPGA. Compared to the optimized kernel that is not vectorized, our vectorization can improve the kernel performance by a factor of 16 on the FPGA. The performance improvement ranges from 1 to 11.4 on the CPU, and from 1.02 to 9.3 on the GPU. The effectiveness of kernel vectorization depends on the work-group size.
KW - FPGA
KW - OpenCL
KW - Vectorization
UR - http://www.scopus.com/inward/record.url?scp=85068330001&partnerID=8YFLogxK
U2 - 10.1109/FCCM.2019.00071
DO - 10.1109/FCCM.2019.00071
M3 - Conference contribution
AN - SCOPUS:85068330001
T3 - Proceedings - 27th IEEE International Symposium on Field-Programmable Custom Computing Machines, FCCM 2019
SP - 330
BT - Proceedings - 27th IEEE International Symposium on Field-Programmable Custom Computing Machines, FCCM 2019
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 28 April 2019 through 1 May 2019
ER -