An evaluation of the state of time synchronization on leadership class supercomputers

Terry Jones, George Ostrouchov, Gregory A. Koenig, Oscar H. Mondragon, Patrick G. Bridges

Research output: Contribution to journalArticlepeer-review

5 Scopus citations

Abstract

We present a detailed examination of time agreement characteristics for nodes within extreme-scale parallel computers. Using a software tool we introduce in this paper, we quantify attributes of clock skew among nodes in three representative high-performance computers sited at three national laboratories. Our measurements detail the statistical properties of time agreement among nodes and how time agreement drifts over typical application execution durations. We discuss the implications of our measurements, why the current state of the field is inadequate, and propose strategies to address observed shortcomings.

Original languageEnglish
Article numbere4341
JournalConcurrency and Computation: Practice and Experience
Volume30
Issue number4
DOIs
StatePublished - Feb 25 2018

Funding

The authors would like to thank Don Maxwell of the National Center for Computational Science for his assistance in collecting these numbers. The graphics and statistical analysis contained in this paper were done with the R language and environment for statistical computing. This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. In addition, this research used resources of the Argonne Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC02-06CH11357. Finally, this research used resources of the National Energy Research Scientific Computing Center, a DOE Office of Science User Facility supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231. This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. In addition, this research used resources of the Argonne Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC02-06CH11357. Finally, this research used resources of the National Energy Research Scientific Computing Center, a DOE Office of Science User Facility supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231.

Keywords

  • clock synchronization
  • large-scale systems
  • system software
  • time service

Fingerprint

Dive into the research topics of 'An evaluation of the state of time synchronization on leadership class supercomputers'. Together they form a unique fingerprint.

Cite this