TY - GEN
T1 - An evaluation of user-level failure mitigation support in MPI
AU - Bland, Wesley
AU - Bouteiller, Aurelien
AU - Herault, Thomas
AU - Hursey, Joshua
AU - Bosilca, George
AU - Dongarra, Jack J.
PY - 2012
Y1 - 2012
N2 - As the scale of computing platforms becomes increasingly extreme, the requirements for application fault tolerance are increasing as well. Techniques to address this problem by improving the resilience of algorithms have been developed, but they currently receive no support from the programming model, and without such support, they are bound to fail. This paper discusses the failure-free overhead and recovery impact aspects of the User-Level Failure Mitigation proposal presented in the MPI Forum. Experiments demonstrate that fault-aware MPI has little or no impact on performance for a range of applications, and produces satisfactory recovery times when there are failures.
AB - As the scale of computing platforms becomes increasingly extreme, the requirements for application fault tolerance are increasing as well. Techniques to address this problem by improving the resilience of algorithms have been developed, but they currently receive no support from the programming model, and without such support, they are bound to fail. This paper discusses the failure-free overhead and recovery impact aspects of the User-Level Failure Mitigation proposal presented in the MPI Forum. Experiments demonstrate that fault-aware MPI has little or no impact on performance for a range of applications, and produces satisfactory recovery times when there are failures.
UR - http://www.scopus.com/inward/record.url?scp=84867646266&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-33518-1_24
DO - 10.1007/978-3-642-33518-1_24
M3 - Conference contribution
AN - SCOPUS:84867646266
SN - 9783642335174
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 193
EP - 203
BT - Recent Advances in the Message Passing Interface - 19th European MPI Users' Group Meeting, EuroMPI 2012, Proceedings
T2 - 19th European MPI Users' Group Meeting on Recent Advances in the Message Passing Interface, EuroMPI 2012
Y2 - 23 September 2012 through 26 September 2012
ER -