Abstract
When Field-programmable gate arrays (FPGAs) can implement streaming applications efficiently and high-level synthesis (HLS) tools allow people, who have little hardware design knowledge, to evaluate an application on FPGAs, there is a need to understand where OpenCL and FPGA can play in the streaming domains. To this end, we explore the implementation space and discuss the techniques of optimizing the performance of the streaming kernels using the Intel OpenCL SDK for FPGA. On the Nallatech 385A FPGA platform that features an Arria 10 GX1150 FPGA, the experimental results show that FPGA resources, such as block RAMs and DSPs, can limit the performance of a kernel before the constraint of memory bandwidth takes effect. Kernel vectorization and compute unit duplication are practical optimization techniques that can improve the kernel performance by a factor of 2.8 to 10. The combination of the two techniques can improve the performance by a factor of 3.3 to 16, achieving the highest performance. To improve the performance of streaming kernels with compute unit duplication, the local work size needs to be tuned. The optimal value can increase the performance of a duplicated kernel without tuning by a factor of 3 to 70.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of the International Workshop on OpenCL, IWOCL 2018 |
| Publisher | Association for Computing Machinery |
| ISBN (Electronic) | 9781450364393 |
| DOIs | |
| State | Published - May 14 2018 |
| Externally published | Yes |
| Event | 6th International Workshop on OpenCL, IWOCL 2018 - Oxford, United Kingdom Duration: May 14 2018 → May 16 2018 |
Publication series
| Name | ACM International Conference Proceeding Series |
|---|
Conference
| Conference | 6th International Workshop on OpenCL, IWOCL 2018 |
|---|---|
| Country/Territory | United Kingdom |
| City | Oxford |
| Period | 05/14/18 → 05/16/18 |
Funding
We are sincerely grateful to the anonymous reviewers for their highly constructive criticism. The research work was supported by the U.S. Department of Energy, Office of Science, under contract DEAC02-06CH11357.
Keywords
- FPGA
- OpenCL
- Streaming kernels