Exploring integer sum reduction using atomics on Intel CPU

Zheming Jin, Hal Finkel

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Atomic functions are useful in updating a shared variable by multiple threads, barrier synchronizations, constructing complex data structures, and building high-level frameworks. In this paper, we focus on the evaluation and analysis of integer sum reduction, a common data parallel primitive. We convert the sequential reduction into parallel OpenCL implementations on a CPU. To understand the relationships between the kernel performance and the operations involved in reduction, we develop three micro-kernels that show the costs of one atomic addition to global memory from one work-item per work-group, a work-group barrier, and reducing within a work-group to local memory using one atomic addition per work-item. The sum reduction kernel with vectorized memory accesses can improve the performance of the baseline kernel for a wide range of work-group sizes. However, the vectorization efficiency shrinks with the growing work-group size. We also find that the vendor’s default OpenCL kernel optimization does not improve the kernel performance. When the vectorization width is 16, the performance speedup of our manual vectorization over the vendor’s auto-vectorization ranges from 1.03 to 16.7. We attribute the performance drop to the fact that the default kernel optimizations instantiate a large number of atomics operations.

Original languageEnglish
Title of host publicationProceedings of the International Workshop on OpenCL, IWOCL 2019
PublisherAssociation for Computing Machinery
ISBN (Electronic)9781450362306
DOIs
StatePublished - May 13 2019
Externally publishedYes
Event2019 International Workshop on OpenCL, IWOCL 2019 - Boston, United States
Duration: May 13 2019May 15 2019

Publication series

NameACM International Conference Proceeding Series

Conference

Conference2019 International Workshop on OpenCL, IWOCL 2019
Country/TerritoryUnited States
CityBoston
Period05/13/1905/15/19

Keywords

  • Atomics
  • CPU
  • Integer Sum reduction
  • OpenCL
  • Vectorization

Fingerprint

Dive into the research topics of 'Exploring integer sum reduction using atomics on Intel CPU'. Together they form a unique fingerprint.

Cite this