TY - GEN
T1 - Unified development for mixed multi-GPU and multi-coprocessor environments using a lightweight runtime environment
AU - Haidar, Azzam
AU - Cao, Chongxiao
AU - Yarkhan, Asim
AU - Luszczek, Piotr
AU - Tomov, Stanimire
AU - Kabir, Khairul
AU - Dongarra, Jack
PY - 2014
Y1 - 2014
N2 - Many of the heterogeneous resources available to modern computers are designed for different workloads. In order to efficiently use GPU resources, the workload must have a greater degree of parallelism than a workload designed for multicore-CPUs. And conceptually, the Intel Xeon Phi coprocessors are capable of handling workloads somewhere in between the two. This multitude of applicable workloads will likely lead to mixing multicore-CPUs, GPUs, and Intel coprocessors in multi-user environments that must offer adequate computing facilities for a wide range of workloads. In this work, we are using a lightweight runtime environment to manage the resource-specific workload, and to control the dataflow and parallel execution in two-way hybrid systems. The lightweight runtime environment uses task superscalar concepts to enable the developer to write serial code while providing parallel execution. In addition, our task abstractions enable unified algorithmic development across all the heterogeneous resources. We provide performance results for dense linear algebra applications, demonstrating the effectiveness of our approach and full utilization of a wide variety of accelerator hardware.
AB - Many of the heterogeneous resources available to modern computers are designed for different workloads. In order to efficiently use GPU resources, the workload must have a greater degree of parallelism than a workload designed for multicore-CPUs. And conceptually, the Intel Xeon Phi coprocessors are capable of handling workloads somewhere in between the two. This multitude of applicable workloads will likely lead to mixing multicore-CPUs, GPUs, and Intel coprocessors in multi-user environments that must offer adequate computing facilities for a wide range of workloads. In this work, we are using a lightweight runtime environment to manage the resource-specific workload, and to control the dataflow and parallel execution in two-way hybrid systems. The lightweight runtime environment uses task superscalar concepts to enable the developer to write serial code while providing parallel execution. In addition, our task abstractions enable unified algorithmic development across all the heterogeneous resources. We provide performance results for dense linear algebra applications, demonstrating the effectiveness of our approach and full utilization of a wide variety of accelerator hardware.
KW - dense linear algebra
KW - hardware accelerators
KW - runtime scheduling
UR - http://www.scopus.com/inward/record.url?scp=84906695224&partnerID=8YFLogxK
U2 - 10.1109/IPDPS.2014.58
DO - 10.1109/IPDPS.2014.58
M3 - Conference contribution
AN - SCOPUS:84906695224
SN - 9780769552071
T3 - Proceedings of the International Parallel and Distributed Processing Symposium, IPDPS
SP - 491
EP - 500
BT - Proceedings - IEEE 28th International Parallel and Distributed Processing Symposium, IPDPS 2014
PB - IEEE Computer Society
T2 - 28th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2014
Y2 - 19 May 2014 through 23 May 2014
ER -