TY - GEN
T1 - Evaluating floating-point intensive applications on OpenCL FPGA platforms
T2 - 2018 International Conference on Reconfigurable Computing and FPGAs, ReConFig 2018
AU - Jin, Zheming
AU - Finkel, Hal
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/7/2
Y1 - 2018/7/2
N2 - FPGAs are becoming a promising choice as a heterogeneous computing component for scientific computing when floating-point optimized architectures are added to current FPGAs. The high-level synthesis tools such as Intel FPGA SDK for OpenCL provide a streamlined design flow to facilitate the use of FPGAs for researchers. In this paper, we choose a nuclear reactor simulation application, the SimpleMOC kernel, as a case study to evaluate the potential and effectiveness of using an FPGA for floating-point intensive applications. We describe the OpenCL implementations of the kernel optimized with low-latency floating-point operators, on-chip memory accesses, loop transformations, kernel vectorization, and compute-unit duplication on an Intel Arria10-based FPGA platform, and evaluate their performance and resource utilizations. Compared to the baseline OpenCL implementation of the kernel, our optimizations improve the kernel performance by a factor of 102. We also evaluate the kernel application on an Intel Xeon 16-core CPU and an Nvidia Tesla K80 GPU. The GPU is approximately 2X faster than the CPU and 7.5X faster than the FPGA. The power consumption on the FPGA is 4.5X and 6.4X lower than that on the GPU and CPU, respectively. The performance per watt on the FPGA is 1.74X higher than that on the CPU, and 1.65X lower than that on the GPU.
AB - FPGAs are becoming a promising choice as a heterogeneous computing component for scientific computing when floating-point optimized architectures are added to current FPGAs. The high-level synthesis tools such as Intel FPGA SDK for OpenCL provide a streamlined design flow to facilitate the use of FPGAs for researchers. In this paper, we choose a nuclear reactor simulation application, the SimpleMOC kernel, as a case study to evaluate the potential and effectiveness of using an FPGA for floating-point intensive applications. We describe the OpenCL implementations of the kernel optimized with low-latency floating-point operators, on-chip memory accesses, loop transformations, kernel vectorization, and compute-unit duplication on an Intel Arria10-based FPGA platform, and evaluate their performance and resource utilizations. Compared to the baseline OpenCL implementation of the kernel, our optimizations improve the kernel performance by a factor of 102. We also evaluate the kernel application on an Intel Xeon 16-core CPU and an Nvidia Tesla K80 GPU. The GPU is approximately 2X faster than the CPU and 7.5X faster than the FPGA. The power consumption on the FPGA is 4.5X and 6.4X lower than that on the GPU and CPU, respectively. The performance per watt on the FPGA is 1.74X higher than that on the CPU, and 1.65X lower than that on the GPU.
KW - FPGA
KW - OpenCL
KW - SimpleMOC Kernel
UR - http://www.scopus.com/inward/record.url?scp=85063139678&partnerID=8YFLogxK
U2 - 10.1109/RECONFIG.2018.8641693
DO - 10.1109/RECONFIG.2018.8641693
M3 - Conference contribution
AN - SCOPUS:85063139678
T3 - 2018 International Conference on Reconfigurable Computing and FPGAs, ReConFig 2018
BT - 2018 International Conference on Reconfigurable Computing and FPGAs, ReConFig 2018
A2 - Andrews, David
A2 - Cumplido, Rene
A2 - Feregrino, Claudia
A2 - Stroobandt, Dirk
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 3 December 2018 through 5 December 2018
ER -