Abstract
We present a modeling framework to accurately predict time to run dense linear algebra calculation. We report the framework's accuracy in a number of varied computational environments such as shared memory multicore systems, clusters, and large supercomputing installations with tens of thousands of cores. We also test the accuracy for various algorithms, each of which having a different scaling properties and tolerance to low-bandwidth/high-latency interconnects. The predictive accuracy is very good and on the order of measurement accuracy which makes the method suitable for both dedicated and non-dedicated environments. We also present a practical application of our model to reduce the time required to tune and optimize large parallel runs whose time is dominated by linear algebra computations. We show practical examples of how to apply the methodology to avoid common pitfalls and reduce the influence of measurement errors and the inherent performance variability.
Original language | English |
---|---|
Title of host publication | Parallel Processing and Applied Mathematics - 9th International Conference, PPAM 2011, Revised Selected Papers |
Pages | 730-739 |
Number of pages | 10 |
Edition | PART 1 |
DOIs | |
State | Published - 2012 |
Event | 9th International Conference on Parallel Processing and Applied Mathematics, PPAM 2011 - Torun, Poland Duration: Sep 11 2011 → Sep 14 2011 |
Publication series
Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|
Number | PART 1 |
Volume | 7203 LNCS |
ISSN (Print) | 0302-9743 |
ISSN (Electronic) | 1611-3349 |
Conference
Conference | 9th International Conference on Parallel Processing and Applied Mathematics, PPAM 2011 |
---|---|
Country/Territory | Poland |
City | Torun |
Period | 09/11/11 → 09/14/11 |
Funding
This research was supported by DARPA through ORNL subcontract 4000075916 as well as NSF through award number 1038814. We would like to also thank Patrick Worley from ORNL for facilitating the large scale runs on Jaguar’s Cray XT4 partition.
Keywords
- Linear systems
- modeling techniques
- parallel algorithms