A fully empirical autotuned dense QR factorization for multicore architectures

Emmanuel Agullo, Jack Dongarra, Rajib Nath, Stanimire Tomov

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

10 Scopus citations

Abstract

Tuning numerical libraries has become more difficult over time, as systems get more sophisticated. In particular, modern multicore machines make the behaviour of algorithms hard to forecast and model. In this paper, we tackle the issue of tuning a dense QR factorization on multicore architectures using a fully empirical approach.We exhibit a few strong empirical properties that enable us to efficiently prune the search space. Our method is automatic, fast and reliable. The tuning process is indeed fully performed at install time in less than one hour and ten minutes on five out of seven platforms. We achieve an average performance varying from 97% to 100% of the optimum performance depending on the platform. This work is a basis for autotuning the PLASMA library and enabling easy performance portability across hardware systems.

Original languageEnglish
Title of host publicationEuro-Par 2011 Parallel Processing - 17th International Conference, Proceedings
Pages194-205
Number of pages12
EditionPART 2
DOIs
StatePublished - 2011
Externally publishedYes
Event17th International Conference on Parallel Processing, Euro-Par 2011 - Bordeaux, France
Duration: Aug 29 2011Sep 2 2011

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
NumberPART 2
Volume6853 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference17th International Conference on Parallel Processing, Euro-Par 2011
Country/TerritoryFrance
CityBordeaux
Period08/29/1109/2/11

Fingerprint

Dive into the research topics of 'A fully empirical autotuned dense QR factorization for multicore architectures'. Together they form a unique fingerprint.

Cite this