A Study on Atomics-based Integer Sum Reduction in HIP on AMD GPU

Zheming Jin, Jeffrey Vetter

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Integer sum reduction is a primitive operation commonly used in scientific computing. Implementing a parallel reduction on a GPU often involves concurrent memory accesses using atomic operations and synchronization of work-items in a work-group. For a better understanding of these operations, we redesigned micro-kernels in the HIP programming language to measure the time of atomic operations over global memory, the cost of barrier synchronization, and reduction within a work-group to shared local memory using one atomic addition per work-item on a compute unit in an AMD MI100 GPU. Then, we describe the implementations of the reduction kernels with vectorized memory accesses, parameterized workload sizes, and vendor's library APIs. Our experimental results show that 1) there is a performance tradeoff between the cost of barrier synchronization and the amount of parallelism from atomic operations over shared local memory when we increase the size of a work-group. 2) a reduction kernel with vectorized memory accesses and vector data types is approximately 3% faster for the large problem size than the kernels written with the vendor's library APIs. 3) the compiler needs to assist the hardware processor with data dependency resolution at the level of instruction set architecture. 4) the power consumption of the kernel execution on the GPU fluctuates between 277 Watts and 301 Watts and the dynamic power of other GPU activities is at most 31 Watts.

Original languageEnglish
Title of host publication51st International Conference on Parallel Processing, ICPP 2022 - Workshop Proceedings
PublisherAssociation for Computing Machinery
ISBN (Electronic)9781450394451
DOIs
StatePublished - Aug 29 2022
Event51st International Conference on Parallel Processing, ICPP 2022 - Virtual, Online, France
Duration: Aug 29 2022Sep 1 2022

Publication series

NameACM International Conference Proceeding Series

Conference

Conference51st International Conference on Parallel Processing, ICPP 2022
Country/TerritoryFrance
CityVirtual, Online
Period08/29/2209/1/22

Funding

We appreciate the reviewers for their criticisms. This research used resources of the Experimental Computing Lab at ORNL. This research was supported by the US Department of Energy Advanced Scientific Computing Research program under Contract No. DE-AC05-00OR22725.

FundersFunder number
US Department of Energy Advanced Scientific Computing ResearchDE-AC05-00OR22725

    Keywords

    • GPU
    • Parallel reduction
    • programming language

    Fingerprint

    Dive into the research topics of 'A Study on Atomics-based Integer Sum Reduction in HIP on AMD GPU'. Together they form a unique fingerprint.

    Cite this