TY - GEN
T1 - Flexible linear algebra development and scheduling with cholesky factorization
AU - Haidar, Azzam
AU - Yarkhan, Asim
AU - Cao, Chongxiao
AU - Luszczek, Piotr
AU - Tomov, Stanimire
AU - Dongarra, Jack
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2015/11/23
Y1 - 2015/11/23
N2 - Modern high performance computing environments are composed of networks of compute nodes that often contain a variety of heterogeneous compute resources, such as multicore CPUs and GPUs. One challenge faced by domain scientists ishow to efficiently use all these distributed, heterogeneous resources. Inorder to use the GPUs effectively, the workload parallelism needs to be muchgreater than the parallelism for a multicore-CPU. Additionally, effectivelyusing distributed memory nodes brings out another level of complexity where theworkload must be carefully partitioned over the nodes. In this work we areusing a lightweight runtime environment to handle many of the complexities insuch distributed, heterogeneous systems. The runtime environment usestask-superscalar concepts to enable the developer to write serial code whileproviding parallel execution. The task-programming model allows the developerto write resource-specialization code, so that each resource gets theappropriate sized workload-grain. Our task-programming abstraction enables thedeveloper to write a single algorithm that will execute efficiently across thedistributed heterogeneous machine. We demonstrate the effectiveness of ourapproach with performance results for dense linear algebra applications, specifically the Cholesky factorization.
AB - Modern high performance computing environments are composed of networks of compute nodes that often contain a variety of heterogeneous compute resources, such as multicore CPUs and GPUs. One challenge faced by domain scientists ishow to efficiently use all these distributed, heterogeneous resources. Inorder to use the GPUs effectively, the workload parallelism needs to be muchgreater than the parallelism for a multicore-CPU. Additionally, effectivelyusing distributed memory nodes brings out another level of complexity where theworkload must be carefully partitioned over the nodes. In this work we areusing a lightweight runtime environment to handle many of the complexities insuch distributed, heterogeneous systems. The runtime environment usestask-superscalar concepts to enable the developer to write serial code whileproviding parallel execution. The task-programming model allows the developerto write resource-specialization code, so that each resource gets theappropriate sized workload-grain. Our task-programming abstraction enables thedeveloper to write a single algorithm that will execute efficiently across thedistributed heterogeneous machine. We demonstrate the effectiveness of ourapproach with performance results for dense linear algebra applications, specifically the Cholesky factorization.
KW - Accelerator-based distributed memory computers
KW - Cholesky factorization
KW - Heterogeneous HPC computing
KW - Superscalar dataflow scheduling
UR - http://www.scopus.com/inward/record.url?scp=84961761036&partnerID=8YFLogxK
U2 - 10.1109/HPCC-CSS-ICESS.2015.285
DO - 10.1109/HPCC-CSS-ICESS.2015.285
M3 - Conference contribution
AN - SCOPUS:84961761036
T3 - Proceedings - 2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security and 2015 IEEE 12th International Conference on Embedded Software and Systems, HPCC-CSS-ICESS 2015
SP - 861
EP - 864
BT - Proceedings - 2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security and 2015 IEEE 12th International Conference on Embedded Software and Systems, HPCC-CSS-ICESS 2015
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 17th IEEE International Conference on High Performance Computing and Communications, IEEE 7th International Symposium on Cyberspace Safety and Security and IEEE 12th International Conference on Embedded Software and Systems, HPCC-ICESS-CSS 2015
Y2 - 24 August 2015 through 26 August 2015
ER -