Abstract
Field-programmable gate arrays (FPGAs) are a hardware accelerator option that is growing in popularity. However, FPGAs are notoriously hard to program. To this end, high-level synthesis (HLS) tools have been developed to allow programmers to design hardware accelerators with FPGAs using familiar software languages. The two largest FPGA vendors, Intel and Xilinx, support both C/C++ and OpenCL C to construct kernels. However, little is known about the portability of designs between these two platforms.In this work, we evaluate the portability and performance of Intel and Xilinx kernels. We conduct a case study, porting the Needleman-Wunsch application from the Rodinia benchmark suite written in Intel OpenCL C to Xilinx platforms. We use OpenCL C kernels optimized for Intel FPGA platforms as a starting point and first perform a minimum effort port to a Xilinx FPGA, also using OpenCL C. We find that simply porting one-To-one optimizations is not enough to enable portable performance. We then seek to improve the performance of those kernels using Xilinx C/C++. With rewriting the kernel for burst transfer and other optimizations, we are able to reduce the execution time from an initial 294 s to 2.2 s.
Original language | English |
---|---|
Title of host publication | 2021 IEEE High Performance Extreme Computing Conference, HPEC 2021 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
ISBN (Electronic) | 9781665423694 |
DOIs | |
State | Published - 2021 |
Event | 2021 IEEE High Performance Extreme Computing Conference, HPEC 2021 - Virtual, Online, United States Duration: Sep 20 2021 → Sep 24 2021 |
Publication series
Name | 2021 IEEE High Performance Extreme Computing Conference, HPEC 2021 |
---|
Conference
Conference | 2021 IEEE High Performance Extreme Computing Conference, HPEC 2021 |
---|---|
Country/Territory | United States |
City | Virtual, Online |
Period | 09/20/21 → 09/24/21 |
Funding
This manuscript has been co-authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. De- partment of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan. The authors would like to acknowledge the ORNL Experimental Computing Laboratory team for its support with the compute resources and the software stack. We would also like to acknowledge Amy An for her help in data analysis and presentation. This research was supported in part by the following sources: National Science Foundation (NSF) under grant CNS-1763503, Defense Advanced Research Projects Agency (DARPA) Microsystems Technology Office (MTO) Domain-Specific System-on-Chip Program, and the US Department of Energy (DOE) Advanced Scientific Computing Research (ASCR) program.