TY - GEN
T1 - Practical scalable consensus for pseudo-synchronous distributed systems
AU - Herault, Thomas
AU - Bouteiller, Aurelien
AU - Bosilca, George
AU - Gamell, Marc
AU - Teranishi, Keita
AU - Parashar, Manish
AU - Dongarra, Jack
N1 - Publisher Copyright:
© 2015 ACM.
PY - 2015/11/15
Y1 - 2015/11/15
N2 - The ability to consistently handle faults in a distributed environment requires, among a small set of basic routines, an agreement algorithm allowing surviving entities to reach a consensual decision between a bounded set of volatile resources. This paper presents an algorithm that implements an Early Returning Agreement (ERA) in pseudo-synchronous systems, which optimistically allows a process to resume its activity while guaranteeing strong progress. We prove the correctness of our ERA algorithm, and expose its logarithmic behavior, which is an extremely desirable property for any algorithm which targets future exascale platforms. We detail a practical implementation of this consensus algorithm in the context of an MPI library, and evaluate both its efficiency and scalability through a set of benchmarks and two fault tolerant scientific applications.
AB - The ability to consistently handle faults in a distributed environment requires, among a small set of basic routines, an agreement algorithm allowing surviving entities to reach a consensual decision between a bounded set of volatile resources. This paper presents an algorithm that implements an Early Returning Agreement (ERA) in pseudo-synchronous systems, which optimistically allows a process to resume its activity while guaranteeing strong progress. We prove the correctness of our ERA algorithm, and expose its logarithmic behavior, which is an extremely desirable property for any algorithm which targets future exascale platforms. We detail a practical implementation of this consensus algorithm in the context of an MPI library, and evaluate both its efficiency and scalability through a set of benchmarks and two fault tolerant scientific applications.
KW - MPI
KW - agreement
KW - fault-tolerance
UR - http://www.scopus.com/inward/record.url?scp=84966671012&partnerID=8YFLogxK
U2 - 10.1145/2807591.2807665
DO - 10.1145/2807591.2807665
M3 - Conference contribution
AN - SCOPUS:84966671012
T3 - International Conference for High Performance Computing, Networking, Storage and Analysis, SC
BT - Proceedings of SC 2015
PB - IEEE Computer Society
T2 - International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2015
Y2 - 15 November 2015 through 20 November 2015
ER -