Abstract
Using large computer systems such as HPC clusters up to their full potential can be hard. Many problems and inefficiencies relate to the interactions of user workloads and system-level policies. These policies enable various setup choices of the resource management system (RMS) as well as the applied scheduling policy. While expert’s assessment and well known best practices do their job when tuning the performance, there is usually plenty of room for further improvements, e.g., by considering more efficient system setups or even radically new scheduling policies. For such potentially damaging modifications it is very suitable to use some form of a simulator first, which allows for repeated evaluations of various setups in a fully controlled manner. This paper presents the latest improvements and advanced simulation capabilities of the Alea job scheduling simulator that has been actively developed for over 10 years now. We present both recently added advanced simulation capabilities as well as a set of real-life based case studies where Alea has been used to evaluate major modifications of real HPC and HTC systems.
| Original language | English |
|---|---|
| Title of host publication | Parallel Processing and Applied Mathematics - 13th International Conference, PPAM 2019, Revised Selected Papers |
| Editors | Roman Wyrzykowski, Konrad Karczewski, Ewa Deelman, Jack Dongarra |
| Publisher | Springer |
| Pages | 217-229 |
| Number of pages | 13 |
| ISBN (Print) | 9783030432218 |
| DOIs | |
| State | Published - 2020 |
| Externally published | Yes |
| Event | 13th International Conference on Parallel Processing and Applied Mathematics, PPAM 2019 - Bialystok, Poland Duration: Sep 8 2019 → Sep 11 2019 |
Publication series
| Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
|---|---|
| Volume | 12044 LNCS |
| ISSN (Print) | 0302-9743 |
| ISSN (Electronic) | 1611-3349 |
Conference
| Conference | 13th International Conference on Parallel Processing and Applied Mathematics, PPAM 2019 |
|---|---|
| Country/Territory | Poland |
| City | Bialystok |
| Period | 09/8/19 → 09/11/19 |
Funding
Acknowledgments. We acknowledge the support and computational resources provided by the MetaCentrum under the program LM2015042, and the support provided by the project Reg. No. CZ.02.1.01/0.0/0.0/16 013/0001797 co-funded by the Ministry of Education, Youth and Sports of the Czech Republic.
Keywords
- Alea
- HPC
- HTC
- Scheduling
- Simulation