Adaptive runtime tuning of parallel sparse matrix-vector multiplication on distributed memory systems

Seyong Lee, Rudolf Eigenmann

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

27 Scopus citations

Abstract

Sparse matrix-vector (SpMV) multiplication is a widely used kernel in scientific applications. In these applications, the SpMV multiplication is usually deeply nested within multiple loops and thus executed a large number of times. We have observed that there can be significant performance variability, due to irregular memory access patterns. Static performance optimizations are difficult because the patterns may be known only at runtime. In this paper, we propose adaptive runtime tuning mechanisms to improve the parallel performance on distributed memory systems. Our adaptive iteration-to-process mapping mechanism balances computational load at runtime with negligible overhead (1% on average), and our runtime communication selection algorithm searches for the best communication method for a given data distribution and mapping. Actual runs on 26 real matrices show that our runtime tuning system reduces execution time up to 68.8% (30.9% on average) over a base block-distributed parallel algorithm on distributed systems with 32 nodes.

Original languageEnglish
Title of host publicationICS'08 - Proceedings of the 2008 ACM International Conference on Supercomputing
Pages195-204
Number of pages10
DOIs
StatePublished - 2008
Externally publishedYes
Event22nd ACM International Conference on Supercomputing, ICS'08 - Island of Kos, Greece
Duration: Jun 7 2008Jun 12 2008

Publication series

NameProceedings of the International Conference on Supercomputing

Conference

Conference22nd ACM International Conference on Supercomputing, ICS'08
Country/TerritoryGreece
CityIsland of Kos
Period06/7/0806/12/08

Keywords

  • Process mapping
  • Runtime tuning
  • Sparse matrix

Fingerprint

Dive into the research topics of 'Adaptive runtime tuning of parallel sparse matrix-vector multiplication on distributed memory systems'. Together they form a unique fingerprint.

Cite this