TY - GEN
T1 - Redundant execution of HPC applications with MR-MPI
AU - Engelmann, Christian
AU - Böhm, Swen
PY - 2011
Y1 - 2011
N2 - This paper presents a modular-redundant Message Passing Interface (MPI) solution, MR-MPI, for transparently executing high-performance computing (HPC) applications in a redundant fashion. The presented work addresses the deficiencies of recovery-oriented HPC, i.e., checkpoint/restart to/from a parallel file system, at extreme scale by adding the redundancy approach to the HPC resilience portfolio. It utilizes the MPI performance tool interface, PMPI, to transparently intercept MPI calls from an application and to hide all redundancy-related mechanisms. A redundantly executed application runs with rm native MPI processes, where r is the number of MPI ranks visible to the application and m is the replication degree. Messages between redundant nodes are replicated. Partial replication for tunable resilience is supported. The performance results clearly show the negative impact of the O(mm) messages between replicas. For low-level, point-to-point benchmarks, the impact can be as high as the replication degree. For applications, performance highly depends on the actual communication types and counts. On single-core systems, the overhead can be 0% for embarrassingly parallel applications independent of the employed redundancy configuration or up to 70-90% for communication- intensive applications in a dual-redundant configuration. On multi-core systems, the overhead can be significantly higher due to the additional communication contention.
AB - This paper presents a modular-redundant Message Passing Interface (MPI) solution, MR-MPI, for transparently executing high-performance computing (HPC) applications in a redundant fashion. The presented work addresses the deficiencies of recovery-oriented HPC, i.e., checkpoint/restart to/from a parallel file system, at extreme scale by adding the redundancy approach to the HPC resilience portfolio. It utilizes the MPI performance tool interface, PMPI, to transparently intercept MPI calls from an application and to hide all redundancy-related mechanisms. A redundantly executed application runs with rm native MPI processes, where r is the number of MPI ranks visible to the application and m is the replication degree. Messages between redundant nodes are replicated. Partial replication for tunable resilience is supported. The performance results clearly show the negative impact of the O(mm) messages between replicas. For low-level, point-to-point benchmarks, the impact can be as high as the replication degree. For applications, performance highly depends on the actual communication types and counts. On single-core systems, the overhead can be 0% for embarrassingly parallel applications independent of the employed redundancy configuration or up to 70-90% for communication- intensive applications in a dual-redundant configuration. On multi-core systems, the overhead can be significantly higher due to the additional communication contention.
KW - Fault tolerance
KW - High-performance computing
KW - Message Passing Interface
KW - Redundancy
KW - Resilience
UR - http://www.scopus.com/inward/record.url?scp=79958180996&partnerID=8YFLogxK
U2 - 10.2316/P.2011.719-031
DO - 10.2316/P.2011.719-031
M3 - Conference contribution
AN - SCOPUS:79958180996
SN - 9780889868649
T3 - Proceedings of the 10th IASTED International Conference on Parallel and Distributed Computing and Networks, PDCN 2011
SP - 31
EP - 38
BT - Proceedings of the 10th IASTED International Conference on Parallel and Distributed Computing and Networks, PDCN 2011
T2 - 10th IASTED International Conference on Parallel and Distributed Computing and Networks, PDCN 2011
Y2 - 15 February 2011 through 17 February 2011
ER -