TY - JOUR
T1 - HPC Programming on Intel Many-Integrated-Core Hardware with MAGMA Port to Xeon Phi
AU - Dongarra, Jack
AU - Gates, Mark
AU - Haidar, Azzam
AU - Jia, Yulu
AU - Kabir, Khairul
AU - Luszczek, Piotr
AU - Tomov, Stanimire
N1 - Publisher Copyright:
Copyright © 2015 Jack Dongarra et al.
PY - 2015
Y1 - 2015
N2 - This paper presents the design and implementation of several fundamental dense linear algebra (DLA) algorithms for multicore with Intel Xeon Phi coprocessors. In particular, we consider algorithms for solving linear systems. Further, we give an overview of the MAGMA MIC library, an open source, high performance library, that incorporates the developments presented here and, more broadly, provides the DLA functionality equivalent to that of the popular LAPACK library while targeting heterogeneous architectures that feature a mix of multicore CPUs and coprocessors. The LAPACK-compliance simplifies the use of the MAGMA MIC library in applications, while providing them with portably performant DLA. High performance is obtained through the use of the high-performance BLAS, hardware-specific tuning, and a hybridization methodology whereby we split the algorithm into computational tasks of various granularities. Execution of those tasks is properly scheduled over the heterogeneous hardware by minimizing data movements and mapping algorithmic requirements to the architectural strengths of the various heterogeneous hardware components. Our methodology and programming techniques are incorporated into the MAGMA MIC API which abstracts the application developer from the specifics of the Xeon Phi architecture and is therefore applicable to algorithms beyond the scope of DLA.
AB - This paper presents the design and implementation of several fundamental dense linear algebra (DLA) algorithms for multicore with Intel Xeon Phi coprocessors. In particular, we consider algorithms for solving linear systems. Further, we give an overview of the MAGMA MIC library, an open source, high performance library, that incorporates the developments presented here and, more broadly, provides the DLA functionality equivalent to that of the popular LAPACK library while targeting heterogeneous architectures that feature a mix of multicore CPUs and coprocessors. The LAPACK-compliance simplifies the use of the MAGMA MIC library in applications, while providing them with portably performant DLA. High performance is obtained through the use of the high-performance BLAS, hardware-specific tuning, and a hybridization methodology whereby we split the algorithm into computational tasks of various granularities. Execution of those tasks is properly scheduled over the heterogeneous hardware by minimizing data movements and mapping algorithmic requirements to the architectural strengths of the various heterogeneous hardware components. Our methodology and programming techniques are incorporated into the MAGMA MIC API which abstracts the application developer from the specifics of the Xeon Phi architecture and is therefore applicable to algorithms beyond the scope of DLA.
UR - http://www.scopus.com/inward/record.url?scp=84929340698&partnerID=8YFLogxK
U2 - 10.1155/2015/502593
DO - 10.1155/2015/502593
M3 - Article
AN - SCOPUS:84929340698
SN - 1058-9244
VL - 2015
JO - Scientific Programming
JF - Scientific Programming
M1 - 502593
ER -