Empirical performance tuning of dense linear algebra software

Jack Dongarra, Shirley Moore

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

2 Scopus citations

Abstract

Dense linear algebra (DLA) forms the core of many scientific computing applications. Consequently, there is continuous interest and demand for the development of efficient algorithms and implementations on new architectures. One response to this demand has been the development of the ATLAS (Automatic Tuning of Linear Algebra Software) system to automatically produce implementations of the BLAS (Basic Linear Algebra Subroutines) routines that underlie all of dense linear algebra. ATLAS generates efficient code by running a series of timing experiments using standard techniques for improving performance (loop unrolling, blocking, etc.) to determine optimal parameters and code structures. While ATLAS has been highly successful in tuning DLA for cache-based architectures, we are developing new auto-tuning techniques for multicore and heterogeneous architectures that exploit higher levels of parallelism and asynchronous scheduling. This chapter describes the ATLAS techniques as well as recent research on empirical tuning of dense linear algebra routines for multicore and GPU architectures.

Original languageEnglish
Title of host publicationPerformance Tuning of Scientific Applications
PublisherCRC Press
Pages255-272
Number of pages18
ISBN (Electronic)9781439815700
ISBN (Print)9781439815694
DOIs
StatePublished - Jan 1 2010
Externally publishedYes

Fingerprint

Dive into the research topics of 'Empirical performance tuning of dense linear algebra software'. Together they form a unique fingerprint.

Cite this