Abstract
This paper presents a methodology for using LLVM-based tools to tune the DCA++ (dynamical cluster approximation) application that targets the new ARM A64FX processor. The goal is to describe the changes required for the new architecture and generate efficient single instruction/multiple data (SIMD) instructions that target the new Scalable Vector Extension instruction set. During manual tuning, the authors used the LLVM tools to improve code parallelization by using OpenMP SIMD, refactored the code and applied transformation that enabled SIMD optimizations, and ensured that the correct libraries were used to achieve optimal performance. By applying these code changes, code speed was increased by 1.98 × and 78 GFlops were achieved on the A64FX processor. The authors aim to automatize parts of the efforts in the OpenMP Advisor tool, which is built on top of existing and newly introduced LLVM tooling.
Original language | English |
---|---|
Title of host publication | OpenMP |
Subtitle of host publication | Enabling Massive Node-Level Parallelism - 17th International Workshop on OpenMP, IWOMP 2021, Proceedings |
Editors | Simon McIntosh-Smith, Bronis R. de Supinski, Jannis Klinkenberg |
Publisher | Springer Science and Business Media Deutschland GmbH |
Pages | 142-155 |
Number of pages | 14 |
ISBN (Print) | 9783030852610 |
DOIs | |
State | Published - 2021 |
Event | 17th International Workshop on OpenMP, IWOMP 2021 - Bristol, United Kingdom Duration: Sep 14 2021 → Sep 16 2021 |
Publication series
Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|
Volume | 12870 LNCS |
ISSN (Print) | 0302-9743 |
ISSN (Electronic) | 1611-3349 |
Conference
Conference | 17th International Workshop on OpenMP, IWOMP 2021 |
---|---|
Country/Territory | United Kingdom |
City | Bristol |
Period | 09/14/21 → 09/16/21 |
Funding
Notice: This manuscript has been authored by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the US Department of Energy (DOE). The US government retains and the publisher, by accepting the article for publication, acknowledges that the US government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for US government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy. gov/downloads/doe-public-access-plan). This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344 (LLNL-CONF-819815). This work was supported by the Scientific Discovery through Advanced Computing (SciDAC) program funded by US Department of Energy, Office of Science, Advanced Scientific Computing Research (ASCR) and Basic Energy Sciences (BES) Division of Materials Sciences and Engineering. This research was also supported by the Exas-cale Computing Project (17-SC-20-SC), a collaborative effort of the US Department of Energy Office of Science and the National Nuclear Security Administration, in particular its subproject on Scaling OpenMP with LLVM for Exascale performance and portability (SOLLVE).
Keywords
- Compilers
- Feedback
- HPC tools
- LLVM
- OpenMP
- SIMD