Using Advanced Vector Extensions AVX-512 for MPI Reductions

Dong Zhong, Qinglei Cao, George Bosilca, Jack Dongarra

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

6 Scopus citations

Abstract

As the scale of high-performance computing (HPC) systems continues to grow, researchers are devoted themselves to explore increasing levels of parallelism to achieve optimal performance. The modern CPU's design, including its features of hierarchical memory and SIMD/vectorization capability, governs algorithms' efficiency. The recent introduction of wide vector instruction set extensions (AVX and SVE) motivated vectorization to become of critical importance to increase efficiency and close the gap to peak performance. In this paper, we propose an implementation of predefined MPI reduction operations utilizing AVX, AVX2 and AVX-512 intrinsics to provide vector-based reduction operation and to improve the time-to-solution of these predefined MPI reduction operations. With these optimizations, we achieve higher efficiency for local computations, which directly benefit the overall cost of collective reductions. The evaluation of the resulting software stack under different scenarios demonstrates that the solution is at the same time generic and efficient. Experiments are conducted on an Intel Xeon Gold cluster, which shows our AVX-512 optimized reduction operations achieve 10X performance benefits than Open MPI default for MPI local reduction.

Original languageEnglish
Title of host publicationProceedings of 2020 27th European MPI Users'' Group Meeting, EuroMPI/USA 2020
PublisherAssociation for Computing Machinery
Pages1-10
Number of pages10
ISBN (Electronic)9781450388801
DOIs
StatePublished - Sep 21 2020
Externally publishedYes
Event27th European MPI Users' Group Meeting, EuroMPI/USA 2020 - Virtual, Online, United States
Duration: Sep 21 2020Sep 24 2020

Publication series

NameACM International Conference Proceeding Series

Conference

Conference27th European MPI Users' Group Meeting, EuroMPI/USA 2020
Country/TerritoryUnited States
CityVirtual, Online
Period09/21/2009/24/20

Funding

This material is based upon work supported by the National Science Foundation under Grant No. (1725692); and the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration. The authors would also like to thank Texas Advanced Computing Center (TACC). For computer time, this research used the Stampede2 flagship supercomputer of the Extreme Science and Engineering Discovery Environment (XSEDE) hosted at TACC.

FundersFunder number
National Science Foundation17-SC-20-SC, 1725692
U.S. Department of Energy
National Nuclear Security Administration

    Keywords

    • Instruction level parallelism
    • Intel AVX2/AVX-512
    • Long vector extension
    • MPI reduction operation
    • Single instruction multiple data
    • Vector operation

    Fingerprint

    Dive into the research topics of 'Using Advanced Vector Extensions AVX-512 for MPI Reductions'. Together they form a unique fingerprint.

    Cite this