Abstract
This paper describes a lightweight framework that enables autotuning of OpenMP pragmas to ease performance tuning of OpenMP codes across platforms. This paper describes a prototype of the framework and demonstrates its use in identifying best-performing parallel loop schedules and number of threads for five codes from the PolyBench benchmark suite. This process is facilitated by a tool for taking a compact search-space description of pragmas to apply to the loop nest and chooses the best solution using model-based search. This tool offers the potential to achieve performance portability of OpenMP across platforms without burdening the programmer with exploring this search space manually. Performance results show that the tool identifies different selections for schedule and thread count applied to parallel loops across benchmarks, data set sizes and architectures. Performance gain over the baseline with default settings of up to 1.17×, but slowdowns of 0.5× show the importance of preserving default settings. More importantly, this experiment sets the stage for more elaborate experiments to map new OpenMP features such as GPU offloading and the new loop pragma.
Original language | English |
---|---|
Title of host publication | OpenMP |
Subtitle of host publication | Conquering the Full Hardware Spectrum - 15th International Workshop on OpenMP, IWOMP 2019, Proceedings |
Editors | Xing Fan, Oliver Sinnen, Nasser Giacaman, Bronis R. de Supinski |
Publisher | Springer Verlag |
Pages | 50-60 |
Number of pages | 11 |
ISBN (Print) | 9783030285951 |
DOIs | |
State | Published - 2019 |
Externally published | Yes |
Event | 15th International Workshop on OpenMP, IWOMP 2019 - Auckland, New Zealand Duration: Sep 11 2019 → Sep 13 2019 |
Publication series
Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|
Volume | 11718 LNCS |
ISSN (Print) | 0302-9743 |
ISSN (Electronic) | 1611-3349 |
Conference
Conference | 15th International Workshop on OpenMP, IWOMP 2019 |
---|---|
Country/Territory | New Zealand |
City | Auckland |
Period | 09/11/19 → 09/13/19 |
Funding
This research was supported in part by the Exascale Computing Project (17-SC-20SC), a joint project of the U.S. Department of Energy’s Office of Science and National Nuclear Security Administration, and by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, Scientific Discovery through Advanced Computing (SciDAC) program under the RAPIDS Subcontract Award Number 4000159989.
Keywords
- Autotuning
- Loop scheduling
- Performance portability