TY - GEN
T1 - Dodging the cost of unavoidable memory copies in message logging protocols
AU - Bosilca, George
AU - Bouteiller, Aurelien
AU - Herault, Thomas
AU - Lemarinier, Pierre
AU - Dongarra, Jack J.
PY - 2010
Y1 - 2010
N2 - With the number of computing elements spiraling to hundred of thousands in modern HPC systems, failures are common events. Few applications are nevertheless fault tolerant; most are in need for a seamless recovery framework. Among the automatic fault tolerant techniques proposed for MPI, message logging is preferable for its scalable recovery. The major challenge for message logging protocols is the performance penalty on communications during failure-free periods, mostly coming from the payload copy introduced for each message. In this paper, we investigate different approaches for logging payload and compare their impact on network performance.
AB - With the number of computing elements spiraling to hundred of thousands in modern HPC systems, failures are common events. Few applications are nevertheless fault tolerant; most are in need for a seamless recovery framework. Among the automatic fault tolerant techniques proposed for MPI, message logging is preferable for its scalable recovery. The major challenge for message logging protocols is the performance penalty on communications during failure-free periods, mostly coming from the payload copy introduced for each message. In this paper, we investigate different approaches for logging payload and compare their impact on network performance.
UR - http://www.scopus.com/inward/record.url?scp=78149231438&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-15646-5_20
DO - 10.1007/978-3-642-15646-5_20
M3 - Conference contribution
AN - SCOPUS:78149231438
SN - 3642156452
SN - 9783642156458
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 189
EP - 197
BT - Recent Advances in the Message Passing Interface - 17th European MPI Users' Group Meeting, EuroMPI 2010, Proceedings
T2 - 17th European MPI Users' Group Meeting, EuroMPI 2010
Y2 - 12 September 2010 through 15 September 2010
ER -