TY - GEN
T1 - Improving the performance of the extreme-scale simulator
AU - Engelmann, Christian
AU - Naughton, Thomas
N1 - Publisher Copyright:
© 2014 IEEE.
PY - 2014/11/13
Y1 - 2014/11/13
N2 - Investigating the performance of parallel applications at scale on future high-performance computing (HPC) architectures and the performance impact of different architecture choices is an important component of HPC hardware/software co-design. The Extreme-scale Simulator (xSim) is a simulation-based toolkit for investigating the performance of parallel applications at scale. xSim scales to millions of simulated Message Passing Interface (MPI) processes. The overhead introduced by a simulation tool is an important performance and productivity aspect. This paper documents two improvements to xSim: (1) a new deadlock resolution protocol to reduce the parallel discrete event simulation management overhead and (2) a new simulated MPI message matching algorithm to reduce the oversubscription management overhead. The results clearly show a significant performance improvement, such as by reducing the simulation overhead for running the NAS Parallel Benchmark suite inside the simulator from 1,020% to 238% for the conjugate gradient (CG) benchmark and from 102% to 0% for the embarrassingly parallel (EP) and benchmark, as well as, from 37,511% to 13,808% for CG and from 3,332% to 204% for EP with accurate process failure simulation.
AB - Investigating the performance of parallel applications at scale on future high-performance computing (HPC) architectures and the performance impact of different architecture choices is an important component of HPC hardware/software co-design. The Extreme-scale Simulator (xSim) is a simulation-based toolkit for investigating the performance of parallel applications at scale. xSim scales to millions of simulated Message Passing Interface (MPI) processes. The overhead introduced by a simulation tool is an important performance and productivity aspect. This paper documents two improvements to xSim: (1) a new deadlock resolution protocol to reduce the parallel discrete event simulation management overhead and (2) a new simulated MPI message matching algorithm to reduce the oversubscription management overhead. The results clearly show a significant performance improvement, such as by reducing the simulation overhead for running the NAS Parallel Benchmark suite inside the simulator from 1,020% to 238% for the conjugate gradient (CG) benchmark and from 102% to 0% for the embarrassingly parallel (EP) and benchmark, as well as, from 37,511% to 13,808% for CG and from 3,332% to 204% for EP with accurate process failure simulation.
KW - High-performance Computing
KW - Message Passing Interface
KW - Parallel Discrete Event Simulation
KW - Performance Prediction
UR - http://www.scopus.com/inward/record.url?scp=84913536699&partnerID=8YFLogxK
U2 - 10.1109/DS-RT.2014.32
DO - 10.1109/DS-RT.2014.32
M3 - Conference contribution
AN - SCOPUS:84913536699
T3 - Proceedings - IEEE International Symposium on Distributed Simulation and Real-Time Applications, DS-RT
SP - 198
EP - 207
BT - Proceedings - IEEE International Symposium on Distributed Simulation and Real-Time Applications, DS-RT
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 18th IEEE/ACM International Symposium on Distributed Simulations and Real Time Applications, DS-RT 2014
Y2 - 1 October 2014 through 3 October 2014
ER -