TY - GEN
T1 - Toward Evaluating High-Level Synthesis Portability and Performance between Intel and Xilinx FPGAs
AU - Cabrera, Anthony M.
AU - Young, Aaron R.
AU - Lambert, Jacob
AU - Xiao, Zhili
AU - An, Amy
AU - Lee, Seyong
AU - Jin, Zheming
AU - Kim, Jungwon
AU - Buhler, Jeremy
AU - Chamberlain, Roger D.
AU - Vetter, Jeffrey S.
N1 - Publisher Copyright:
© 2021 ACM.
PY - 2021/4/27
Y1 - 2021/4/27
N2 - Offloading computation from a CPU to a hardware accelerator is becoming a more common solution for improving performance because traditional gains enabled by Moore's law and Dennard scaling have slowed. GPUs are often used as hardware accelerators, but field-programmable gate arrays (FPGAs) are gaining traction. FPGAs are beneficial because they allow hardware specific to a particular application to be created. However, they are notoriously difficult to program. To this end, two of the main FPGA manufacturers, Intel and Xilinx, have created tools and frameworks that enable the use of higher level languages to design FPGA hardware. Although Xilinx kernels can be designed by using C/C++, both Intel and Xilinx support the use of OpenCL C to architect FPGA hardware. However, not much is known about the portability and performance between these two device families other than the fact that it is theoretically possible to synthesize a kernel meant for Intel to Xilinx and vice versa. In this work, we evaluate the portability and performance of Intel and Xilinx kernels. We use OpenCL C implementations of a subset of the Rodinia benchmarking suite that were designed for an Intel FPGA and make the necessary modifications to create synthesizable OpenCL C kernels for a Xilinx FPGA. We find that the difficulty of porting certain kernel optimizations varies, depending on the construct. Once the minimum amount of modifications is made to create synthesizable hardware for the Xilinx platform, more nontrivial work is needed to improve performance. However, we find that constructs that are known to be performant for an FPGA should improve performance regardless of the platform; the difficulty comes in deciding how to invoke certain kernel optimizations while also abiding by the constraints enforced by a given platform's hardware compiler.
AB - Offloading computation from a CPU to a hardware accelerator is becoming a more common solution for improving performance because traditional gains enabled by Moore's law and Dennard scaling have slowed. GPUs are often used as hardware accelerators, but field-programmable gate arrays (FPGAs) are gaining traction. FPGAs are beneficial because they allow hardware specific to a particular application to be created. However, they are notoriously difficult to program. To this end, two of the main FPGA manufacturers, Intel and Xilinx, have created tools and frameworks that enable the use of higher level languages to design FPGA hardware. Although Xilinx kernels can be designed by using C/C++, both Intel and Xilinx support the use of OpenCL C to architect FPGA hardware. However, not much is known about the portability and performance between these two device families other than the fact that it is theoretically possible to synthesize a kernel meant for Intel to Xilinx and vice versa. In this work, we evaluate the portability and performance of Intel and Xilinx kernels. We use OpenCL C implementations of a subset of the Rodinia benchmarking suite that were designed for an Intel FPGA and make the necessary modifications to create synthesizable OpenCL C kernels for a Xilinx FPGA. We find that the difficulty of porting certain kernel optimizations varies, depending on the construct. Once the minimum amount of modifications is made to create synthesizable hardware for the Xilinx platform, more nontrivial work is needed to improve performance. However, we find that constructs that are known to be performant for an FPGA should improve performance regardless of the platform; the difficulty comes in deciding how to invoke certain kernel optimizations while also abiding by the constraints enforced by a given platform's hardware compiler.
KW - FPGA
KW - Rodinia
KW - Xilinx
KW - hardware accelerator
KW - high level synthesis
KW - performance
KW - portability
UR - http://www.scopus.com/inward/record.url?scp=85105477381&partnerID=8YFLogxK
U2 - 10.1145/3456669.3456699
DO - 10.1145/3456669.3456699
M3 - Conference contribution
AN - SCOPUS:85105477381
T3 - ACM International Conference Proceeding Series
BT - International Workshop on OpenCL, IWOCL 2021
PB - Association for Computing Machinery
T2 - 2021 International Workshop on OpenCL, IWOCL 2021
Y2 - 27 April 2021 through 29 April 2021
ER -