Abstract
We present a detailed examination of time agreement characteristics for nodes within extreme-scale parallel computers. Using a software tool we introduce in this paper, we quantify attributes of clock skew among nodes in three representative high-performance computers sited at three national laboratories. Our measurements detail the statistical properties of time agreement among nodes and how time agreement drifts over typical application execution durations. We discuss the implications of our measurements, why the current state of the field is inadequate, and propose strategies to address observed shortcomings.
Original language | English |
---|---|
Article number | e4341 |
Journal | Concurrency and Computation: Practice and Experience |
Volume | 30 |
Issue number | 4 |
DOIs | |
State | Published - Feb 25 2018 |
Funding
The authors would like to thank Don Maxwell of the National Center for Computational Science for his assistance in collecting these numbers. The graphics and statistical analysis contained in this paper were done with the R language and environment for statistical computing. This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. In addition, this research used resources of the Argonne Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC02-06CH11357. Finally, this research used resources of the National Energy Research Scientific Computing Center, a DOE Office of Science User Facility supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231. This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. In addition, this research used resources of the Argonne Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC02-06CH11357. Finally, this research used resources of the National Energy Research Scientific Computing Center, a DOE Office of Science User Facility supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231.
Keywords
- clock synchronization
- large-scale systems
- system software
- time service