Abstract
The solving of tridiagonal systems is one of the most computationally expensive parts in many applications, so that multiple studies have explored the use of NVIDIA GPUs to accelerate such computation. However, these studies have mainly focused on using parallel algorithms to compute such systems, which can efficiently exploit the shared memory and are able to saturate the GPUs capacity with a low number of systems, presenting a poor scalability when dealing with a relatively high number of systems. We propose a new implementation (cuThomasBatch) based on the Thomas algorithm. To achieve a good scalability using this approach is necessary to carry out a transformation in the way that the inputs are stored in memory to exploit coalescence (contiguous threads access to contiguous memory locations). The results given in this study proves that the implementation carried out in this work is able to beat the reference code when dealing with a relatively large number of Tridiagonal systems (2,000–256,000), being closed to 3× (in double precision) and 4× (in single precision) faster using one Kepler NVIDIA GPU.
Original language | English |
---|---|
Title of host publication | Parallel Processing and Applied Mathematics - 12th International Conference, PPAM 2017, Revised Selected Papers |
Editors | Jack Dongarra, Roman Wyrzykowski, Konrad Karczewski, Ewa Deelman |
Publisher | Springer Verlag |
Pages | 243-253 |
Number of pages | 11 |
ISBN (Print) | 9783319780238 |
DOIs | |
State | Published - 2018 |
Externally published | Yes |
Event | 12th International Conference on Parallel Processing and Applied Mathematics, PPAM 2017 - Czestochowa, Poland Duration: Sep 10 2017 → Sep 13 2017 |
Publication series
Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|
Volume | 10777 LNCS |
ISSN (Print) | 0302-9743 |
ISSN (Electronic) | 1611-3349 |
Conference
Conference | 12th International Conference on Parallel Processing and Applied Mathematics, PPAM 2017 |
---|---|
Country/Territory | Poland |
City | Czestochowa |
Period | 09/10/17 → 09/13/17 |
Funding
Acknowledgements. This project was funded from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 720270 (HBP SGA1), from the Spanish Ministry of Economy and Competitiveness under the project Computación de Altas Prestaciones VII (TIN2015-65316-P) and the Departament d’Innovació, Universitats i Empresa de la Generalitat de Catalunya, under project MPEXPAR: Models de Programació i Entorns d’Execució Paral·lels (2014-SGR-1051). We thank the support of NVIDIA through the BSC/UPC NVIDIA GPU Center of Excellence and the valuable feedback provided by Lung Sheng Chien (software engineer at NVIDIA) and Alex Fit-Florea (Leading algorithms groups at NVIDIA). Antonio J. Peña is cofinanced by the Spanish Ministry of Economy and Competitiveness under Juan de la Cierva fellowship number IJCI-2015-23266. This project was funded from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 720270 (HBP SGA1), from the Spanish Ministry of Economy and Competitiveness under the project Computación de Altas Prestaciones VII (TIN2015-65316-P) and the Departament d’Innovació, Universitats i Empresa de la Generalitat de Catalunya, under project MPEXPAR: Models de Programació i Entorns d’Execució Paral·lels (2014-SGR-1051). We thank the support of NVIDIA through the BSC/UPC NVIDIA GPU Center of Excellence and the valuable feedback provided by Lung Sheng Chien (software engineer at NVIDIA) and Alex Fit-Florea (Leading algorithms groups at NVIDIA). Antonio J. Peña is cofinanced by the Spanish Ministry of Economy and Competitiveness under Juan de la Cierva fellowship number IJCI-2015-23266.
Keywords
- CR
- CUDA
- PCR
- Parallel processing
- Scalability
- Thomas algorithm
- Tridiagonal linear systems
- cuSPARSE