Abstract
Embedded computing, not only in large systems like drones and hybrid vehicles, but also in small portable devices like smart phones and watches, gets more extreme to meet ever increasing demands for extended and improved functionalities. This, combined with the typical constrains for low power consumption and small sizes, makes the design of numerical libraries for embedded systems challenging. In this paper, we present the design and implementation of embedded system aware algorithms, that target these challenges in the area of dense linear algebra. We consider the fundamental problems of solving linear systems of equations and least squares problems, using the LU, QR, and Cholesky factorizations, and illustrate our results, both in terms of performance and energy efficiency, on the Jetson TK1 development kit. We developed performance optimizations for both small and large problems. In contrast to the corresponding LAPACK algorithms, the new designs target the use of many-cores, readily available now even in mobile devices like the Jetson TK1, e.g., featuring 192 CUDA cores. The implementations presented will form the core of a MAGMA Embedded library, to be released as part of the MAGMA libraries.
Original language | English |
---|---|
Title of host publication | 2015 IEEE High Performance Extreme Computing Conference, HPEC 2015 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
ISBN (Electronic) | 9781467392860 |
DOIs | |
State | Published - Nov 9 2015 |
Externally published | Yes |
Event | IEEE High Performance Extreme Computing Conference, HPEC 2015 - Waltham, United States Duration: Sep 15 2015 → Sep 17 2015 |
Publication series
Name | 2015 IEEE High Performance Extreme Computing Conference, HPEC 2015 |
---|
Conference
Conference | IEEE High Performance Extreme Computing Conference, HPEC 2015 |
---|---|
Country/Territory | United States |
City | Waltham |
Period | 09/15/15 → 09/17/15 |
Funding
This material is based upon work supported by the National Science Foundation under Grant ACI-1339822, the Department of Energy, and NVIDIA. The results were obtained in part with the financial support of the Russian Scientific Fund, Agreement N14-11-00190.