Population count on Intel® CPU, GPU and FPGA

Zheming Jin, Hal Finkel

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Scopus citations

Abstract

Population count is a primitive used in many applications. Commodity processors have dedicated instructions for achieving high-performance population count. Motivated by the productivity of high-level synthesis and the importance of population count, in this paper we investigated the OpenCL implementations of population count algorithms, and evaluated their performance and resource utilizations on an FPGA. Based on the results, we select the most efficient implementation. Then we derived a reduction pattern from a representative application of population count. We parallelized the reduction with atomic functions, and optimized it with vectorized memory accesses, tree reduction, and compute-unit duplication. We evaluated the performance of the reduction kernel on an InteloXeono CPU and an Intel® IrisTM Pro integrated GPU, and an FPGA card that features an Intel® Arria® 10 FPGA. When DRAM memory bandwidth is comparable on the three computing platforms, the FPGA can achieve the highest kernel performance for large workload. On the other hand, we described performance bottlenecks on the FPGA. To make FPGAs more competitive in raw performance compared to high-performant CPU and GPU platforms, it is important to increase external memory bandwidth, minimize data movement between a host and a device, and reduce OpenCL runtime overhead on an FPGA.

Original languageEnglish
Title of host publicationProceedings - 2020 IEEE 34th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2020
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages432-439
Number of pages8
ISBN (Electronic)9781728174457
DOIs
StatePublished - May 2020
Externally publishedYes
Event34th IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2020 - New Orleans, United States
Duration: May 18 2020May 22 2020

Publication series

NameProceedings - 2020 IEEE 34th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2020

Conference

Conference34th IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2020
Country/TerritoryUnited States
CityNew Orleans
Period05/18/2005/22/20

Funding

ACKNOWLEDGMENT We appreciate the reviewers for their constructive criticisms. The research was supported by the U.S. Department of Energy, Office of Science, under contract DE AC02 06CH11357 and made use of the Argonne Leadership Computing Facility, a DOE Office of Science User Facility.

FundersFunder number
U.S. Department of Energy
Office of ScienceDE AC02 06CH11357

    Keywords

    • Heterogeneous computing
    • OpenCL
    • Population count

    Fingerprint

    Dive into the research topics of 'Population count on Intel® CPU, GPU and FPGA'. Together they form a unique fingerprint.

    Cite this