Programming the LU factorization for a multicore system with accelerators

Jakub Kurzak, Piotr Luszczek, Mathieu Faverge, Jack Dongarra

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

LU factorization with partial pivoting is a canonical numerical procedure and the main component of the High Performance LINPACK benchmark. This article presents an implementation of the algorithm for a hybrid, shared memory, system with standard CPU cores and GPU accelerators. Performance in excess of one TeraFLOPS is achieved using four AMD Magny Cours CPUs and four NVIDIA Fermi GPUs.

Original languageEnglish
Title of host publicationHigh Performance Computing for Computational Science, VECPAR 2012 - 10th International Conference, Revised Selected Papers
Pages28-35
Number of pages8
DOIs
StatePublished - 2013
Event10th International Conference on High Performance Computing for Computational Science, VECPAR 2012 - Kobe, Japan
Duration: Jul 17 2012Jul 20 2012

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume7851 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference10th International Conference on High Performance Computing for Computational Science, VECPAR 2012
Country/TerritoryJapan
CityKobe
Period07/17/1207/20/12

Fingerprint

Dive into the research topics of 'Programming the LU factorization for a multicore system with accelerators'. Together they form a unique fingerprint.

Cite this