TY - GEN
T1 - MAGMA embedded
T2 - IEEE High Performance Extreme Computing Conference, HPEC 2015
AU - Haidar, Azzam
AU - Tomov, Stanimire
AU - Luszczek, Piotr
AU - Dongarra, Jack
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2015/11/9
Y1 - 2015/11/9
N2 - Embedded computing, not only in large systems like drones and hybrid vehicles, but also in small portable devices like smart phones and watches, gets more extreme to meet ever increasing demands for extended and improved functionalities. This, combined with the typical constrains for low power consumption and small sizes, makes the design of numerical libraries for embedded systems challenging. In this paper, we present the design and implementation of embedded system aware algorithms, that target these challenges in the area of dense linear algebra. We consider the fundamental problems of solving linear systems of equations and least squares problems, using the LU, QR, and Cholesky factorizations, and illustrate our results, both in terms of performance and energy efficiency, on the Jetson TK1 development kit. We developed performance optimizations for both small and large problems. In contrast to the corresponding LAPACK algorithms, the new designs target the use of many-cores, readily available now even in mobile devices like the Jetson TK1, e.g., featuring 192 CUDA cores. The implementations presented will form the core of a MAGMA Embedded library, to be released as part of the MAGMA libraries.
AB - Embedded computing, not only in large systems like drones and hybrid vehicles, but also in small portable devices like smart phones and watches, gets more extreme to meet ever increasing demands for extended and improved functionalities. This, combined with the typical constrains for low power consumption and small sizes, makes the design of numerical libraries for embedded systems challenging. In this paper, we present the design and implementation of embedded system aware algorithms, that target these challenges in the area of dense linear algebra. We consider the fundamental problems of solving linear systems of equations and least squares problems, using the LU, QR, and Cholesky factorizations, and illustrate our results, both in terms of performance and energy efficiency, on the Jetson TK1 development kit. We developed performance optimizations for both small and large problems. In contrast to the corresponding LAPACK algorithms, the new designs target the use of many-cores, readily available now even in mobile devices like the Jetson TK1, e.g., featuring 192 CUDA cores. The implementations presented will form the core of a MAGMA Embedded library, to be released as part of the MAGMA libraries.
UR - http://www.scopus.com/inward/record.url?scp=84964875309&partnerID=8YFLogxK
U2 - 10.1109/HPEC.2015.7322444
DO - 10.1109/HPEC.2015.7322444
M3 - Conference contribution
AN - SCOPUS:84964875309
T3 - 2015 IEEE High Performance Extreme Computing Conference, HPEC 2015
BT - 2015 IEEE High Performance Extreme Computing Conference, HPEC 2015
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 15 September 2015 through 17 September 2015
ER -