Reducing the time to tune parallel dense linear algebra routines with partial execution and performance modeling

Piotr Luszczek, Jack Dongarra

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

5 Scopus citations

Abstract

We present a modeling framework to accurately predict time to run dense linear algebra calculation. We report the framework's accuracy in a number of varied computational environments such as shared memory multicore systems, clusters, and large supercomputing installations with tens of thousands of cores. We also test the accuracy for various algorithms, each of which having a different scaling properties and tolerance to low-bandwidth/high-latency interconnects. The predictive accuracy is very good and on the order of measurement accuracy which makes the method suitable for both dedicated and non-dedicated environments. We also present a practical application of our model to reduce the time required to tune and optimize large parallel runs whose time is dominated by linear algebra computations. We show practical examples of how to apply the methodology to avoid common pitfalls and reduce the influence of measurement errors and the inherent performance variability.

Original languageEnglish
Title of host publicationParallel Processing and Applied Mathematics - 9th International Conference, PPAM 2011, Revised Selected Papers
Pages730-739
Number of pages10
EditionPART 1
DOIs
StatePublished - 2012
Event9th International Conference on Parallel Processing and Applied Mathematics, PPAM 2011 - Torun, Poland
Duration: Sep 11 2011Sep 14 2011

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
NumberPART 1
Volume7203 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference9th International Conference on Parallel Processing and Applied Mathematics, PPAM 2011
Country/TerritoryPoland
CityTorun
Period09/11/1109/14/11

Funding

This research was supported by DARPA through ORNL subcontract 4000075916 as well as NSF through award number 1038814. We would like to also thank Patrick Worley from ORNL for facilitating the large scale runs on Jaguar’s Cray XT4 partition.

Keywords

  • Linear systems
  • modeling techniques
  • parallel algorithms

Fingerprint

Dive into the research topics of 'Reducing the time to tune parallel dense linear algebra routines with partial execution and performance modeling'. Together they form a unique fingerprint.

Cite this