Optimizing parallel reduction on opencl FPGA platform - A case study of frequent pattern compression

Zheming Jin, Hal Finkel

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Scopus citations

Abstract

Field-programmable gate arrays (FPGAs) are becoming a promising heterogeneous computing component in high-performance computing. To facilitate the usage of FPGAs for developers and researchers, high-level synthesis tools are pushing the FPGA-based design abstraction from the registertransfer level to high-level language design flow using OpenCL/C/C++. Currently, there are few studies on parallel reduction using atomic functions in the OpenCL-based design flow on an FPGA. Inspired by the reduction operation in frequent pattern compression, we transform the function into an OpenCL kernel, and describe the optimizations of the kernel on an Arria10-based FPGA platform as a case study. We found that automatic kernel vectorization does not improve the kernel performance. Users can manually vectorize the kernel to achieve performance speedup. Overall, our optimizations improve the kernel performance by a factor of 11.9 over the baseline kernel. The performance per watt of the kernel on an Intel Arria 10 GX1150 FPGA is 5.3X higher than an Intel Xeon 16-core CPU while 0.625X lower than an Nvidia K80 GPU.

Original languageEnglish
Title of host publicationProceedings - 2018 IEEE 32nd International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2018
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages27-35
Number of pages9
ISBN (Print)9781538655559
DOIs
StatePublished - Aug 3 2018
Externally publishedYes
Event32nd IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2018 - Vancouver, Canada
Duration: May 21 2018May 25 2018

Publication series

NameProceedings - 2018 IEEE 32nd International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2018

Conference

Conference32nd IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2018
Country/TerritoryCanada
CityVancouver
Period05/21/1805/25/18

Keywords

  • Atomics
  • FPGA
  • OpenCL
  • Reductions

Fingerprint

Dive into the research topics of 'Optimizing parallel reduction on opencl FPGA platform - A case study of frequent pattern compression'. Together they form a unique fingerprint.

Cite this