Optimizing I/O forwarding techniques for extreme-scale event tracing

Thomas Ilsche, Joseph Schuchart, Jason Cope, Dries Kimpe, Terry Jones, Andreas Knüpfer, Kamil Iskra, Robert Ross, Wolfgang E. Nagel, Stephen Poole

Research output: Contribution to journalArticlepeer-review

4 Scopus citations

Abstract

Programming development tools are a vital component for understanding the behavior of parallel applications. Event tracing is a principal ingredient to these tools, but new and serious challenges place event tracing at risk on extreme-scale machines. As the quantity of captured events increases with concurrency, the additional data can overload the parallel file system and perturb the application being observed. In this work we present a solution for event tracing on extreme-scale machines. We enhance an I/O forwarding software layer to aggregate and reorganize log data prior to writing to the storage system, significantly reducing the burden on the underlying file system. Furthermore, we introduce a sophisticated write buffering capability to limit the impact. To validate the approach, we employ the Vampir tracing toolset using these new capabilities. Our results demonstrate that the approach increases the maximum traced application size by a factor of 5× to more than 200,000 processes.

Original languageEnglish
Pages (from-to)1-18
Number of pages18
JournalCluster Computing
Volume17
Issue number1
DOIs
StatePublished - Mar 2014

Funding

Acknowledgements We thank Ramanan Sankaran (ORNL) for providing a working version of S3D as well as a benchmark problem set for JaguarPF. We are grateful to Matthias Jurenz for his assistance on VampirTrace as well as Matthias Weber and Ronald Geisler for their support for Vampir. The IOFSL project is supported by the DOE Office of Science and National Nuclear Security Administration (NNSA). This research used resources of the Argonne Leadership Computing Facility at Argonne National Laboratory and the Oak Ridge Leadership Computing Facility at Oak Ridge National Laboratory, which are supported by the Office of Science of the U.S. Department of Energy under contracts DE-AC02-06CH11357 and DE-AC05-00OR22725, respectively. This work was supported in part by the National Science Foundation (NSF) through NSF-0937928 and NSF-0724599. This work is supported in a part by the German Research Foundation (DFG) in the Collaborative Research Center 912 “Highly Adaptive Energy-Efficient Computing“.

FundersFunder number
National Science FoundationNSF-0937928, NSF-0724599
U.S. Department of EnergyDE-AC05-00OR22725, DE-AC02-06CH11357
Office of Science
National Nuclear Security Administration
Argonne National Laboratory
Deutsche Forschungsgemeinschaft

    Keywords

    • Atomic append
    • Event tracing
    • I/O forwarding

    Fingerprint

    Dive into the research topics of 'Optimizing I/O forwarding techniques for extreme-scale event tracing'. Together they form a unique fingerprint.

    Cite this