Performance-oriented optimizations for OpenCL streaming kernels on the FPGA

Zheming Jin, Hal Finkel

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

When Field-programmable gate arrays (FPGAs) can implement streaming applications efficiently and high-level synthesis (HLS) tools allow people, who have little hardware design knowledge, to evaluate an application on FPGAs, there is a need to understand where OpenCL and FPGA can play in the streaming domains. To this end, we explore the implementation space and discuss the techniques of optimizing the performance of the streaming kernels using the Intel OpenCL SDK for FPGA. On the Nallatech 385A FPGA platform that features an Arria 10 GX1150 FPGA, the experimental results show that FPGA resources, such as block RAMs and DSPs, can limit the performance of a kernel before the constraint of memory bandwidth takes effect. Kernel vectorization and compute unit duplication are practical optimization techniques that can improve the kernel performance by a factor of 2.8 to 10. The combination of the two techniques can improve the performance by a factor of 3.3 to 16, achieving the highest performance. To improve the performance of streaming kernels with compute unit duplication, the local work size needs to be tuned. The optimal value can increase the performance of a duplicated kernel without tuning by a factor of 3 to 70.

Original languageEnglish
Title of host publicationProceedings of the International Workshop on OpenCL, IWOCL 2018
PublisherAssociation for Computing Machinery
ISBN (Electronic)9781450364393
DOIs
StatePublished - May 14 2018
Externally publishedYes
Event6th International Workshop on OpenCL, IWOCL 2018 - Oxford, United Kingdom
Duration: May 14 2018May 16 2018

Publication series

NameACM International Conference Proceeding Series

Conference

Conference6th International Workshop on OpenCL, IWOCL 2018
Country/TerritoryUnited Kingdom
CityOxford
Period05/14/1805/16/18

Funding

We are sincerely grateful to the anonymous reviewers for their highly constructive criticism. The research work was supported by the U.S. Department of Energy, Office of Science, under contract DEAC02-06CH11357.

FundersFunder number
U.S. Department of Energy
Office of ScienceDEAC02-06CH11357

    Keywords

    • FPGA
    • OpenCL
    • Streaming kernels

    Fingerprint

    Dive into the research topics of 'Performance-oriented optimizations for OpenCL streaming kernels on the FPGA'. Together they form a unique fingerprint.

    Cite this