Abstract
With the rapidly changing microprocessor designs and architectural diversity (multi-cores, many-cores, accelerators) for the next generation HPC systems, scientific applications must adapt to the hardware, to exploit the different types of parallelism and resources available in the architecture. To get the benefit of all the in-node hardware threads, it is important to use a single programming model to map and coordinate the available work to the different heterogeneous execution units in the node (e.g., multi-core hardware threads (latency optimized), accelerators (bandwidth optimized), etc.). Our goal is to show that we can manage the node complexity of these systems by using OpenMP for in-node parallelization by exploiting different “programming styles” supported by OpenMP 4.5 to program CPU cores and accelerators. Finding out the suitable programming-style (e.g., SPMD style, multi-level tasks, accelerator programming, nested parallelism, or a combination of these) using the latest features of OpenMP to maximize performance and achieve performance portability across heterogeneous and homogeneous systems is still an open research problem. We developed a mini-application, Kronecker Product (KP), from the original DMRG++ application (sparse matrix algebra) computational motif to experiment with different OpenMP programming styles on an OpenPOWER architecture and present their results in this paper.
Original language | English |
---|---|
Title of host publication | High Performance Computing - ISC High Performance 2018 International Workshops, Revised Selected Papers |
Editors | John Shalf, Sadaf Alam, Rio Yokota, Michèle Weiland |
Publisher | Springer Verlag |
Pages | 418-431 |
Number of pages | 14 |
ISBN (Print) | 9783030024642 |
DOIs | |
State | Published - 2018 |
Event | International Conference on High Performance Computing, ISC High Performance 2018 - Frankfurt, Germany Duration: Jun 28 2018 → Jun 28 2018 |
Publication series
Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|
Volume | 11203 LNCS |
ISSN (Print) | 0302-9743 |
ISSN (Electronic) | 1611-3349 |
Conference
Conference | International Conference on High Performance Computing, ISC High Performance 2018 |
---|---|
Country/Territory | Germany |
City | Frankfurt |
Period | 06/28/18 → 06/28/18 |
Funding
Acknowledgment. This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. Research sponsored by the Laboratory Directed Research and Development Program of Oak Ridge National Laboratory, managed by UT-Battelle, LLC, for the U.S. Department of Energy. We developed a mini-application, Kronecker Product (KP), from the original DMRG++ application (sparse matrix algebra) computational G. Alvarez—Author contribution consisted in explaining the DMRG algorithm and its implementation, and not in the OpenMP use and evaluation. This manuscript has been co-authored by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the US Department of Energy (DOE). The US government retains and the publisher, by accepting the article for publication, acknowledges that the US government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for US government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE (http://energy.gov/downloads/doe-public-access-plan).
Keywords
- Data parallelism
- Nested parallelism
- OpenMP
- OpenMP 4.5
- Power8
- Task parallelism