TY - GEN
T1 - Porting DMRG++ Scientific Application to OpenPOWER
AU - Chatterjee, Arghya
AU - Alvarez, Gonzalo
AU - D’Azevedo, Eduardo
AU - Elwasif, Wael
AU - Hernandez, Oscar
AU - Sarkar, Vivek
N1 - Publisher Copyright:
© 2018, Springer Nature Switzerland AG.
PY - 2018
Y1 - 2018
N2 - With the rapidly changing microprocessor designs and architectural diversity (multi-cores, many-cores, accelerators) for the next generation HPC systems, scientific applications must adapt to the hardware, to exploit the different types of parallelism and resources available in the architecture. To get the benefit of all the in-node hardware threads, it is important to use a single programming model to map and coordinate the available work to the different heterogeneous execution units in the node (e.g., multi-core hardware threads (latency optimized), accelerators (bandwidth optimized), etc.). Our goal is to show that we can manage the node complexity of these systems by using OpenMP for in-node parallelization by exploiting different “programming styles” supported by OpenMP 4.5 to program CPU cores and accelerators. Finding out the suitable programming-style (e.g., SPMD style, multi-level tasks, accelerator programming, nested parallelism, or a combination of these) using the latest features of OpenMP to maximize performance and achieve performance portability across heterogeneous and homogeneous systems is still an open research problem. We developed a mini-application, Kronecker Product (KP), from the original DMRG++ application (sparse matrix algebra) computational motif to experiment with different OpenMP programming styles on an OpenPOWER architecture and present their results in this paper.
AB - With the rapidly changing microprocessor designs and architectural diversity (multi-cores, many-cores, accelerators) for the next generation HPC systems, scientific applications must adapt to the hardware, to exploit the different types of parallelism and resources available in the architecture. To get the benefit of all the in-node hardware threads, it is important to use a single programming model to map and coordinate the available work to the different heterogeneous execution units in the node (e.g., multi-core hardware threads (latency optimized), accelerators (bandwidth optimized), etc.). Our goal is to show that we can manage the node complexity of these systems by using OpenMP for in-node parallelization by exploiting different “programming styles” supported by OpenMP 4.5 to program CPU cores and accelerators. Finding out the suitable programming-style (e.g., SPMD style, multi-level tasks, accelerator programming, nested parallelism, or a combination of these) using the latest features of OpenMP to maximize performance and achieve performance portability across heterogeneous and homogeneous systems is still an open research problem. We developed a mini-application, Kronecker Product (KP), from the original DMRG++ application (sparse matrix algebra) computational motif to experiment with different OpenMP programming styles on an OpenPOWER architecture and present their results in this paper.
KW - Data parallelism
KW - Nested parallelism
KW - OpenMP
KW - OpenMP 4.5
KW - Power8
KW - Task parallelism
UR - http://www.scopus.com/inward/record.url?scp=85066148979&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-02465-9_29
DO - 10.1007/978-3-030-02465-9_29
M3 - Conference contribution
AN - SCOPUS:85066148979
SN - 9783030024642
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 418
EP - 431
BT - High Performance Computing - ISC High Performance 2018 International Workshops, Revised Selected Papers
A2 - Shalf, John
A2 - Alam, Sadaf
A2 - Yokota, Rio
A2 - Weiland, Michèle
PB - Springer Verlag
T2 - International Conference on High Performance Computing, ISC High Performance 2018
Y2 - 28 June 2018 through 28 June 2018
ER -