User-based I/O Profiling for Leadership Scale HPC Workloads

Ahmad Hossein Yazdani, Arnab K. Paul, Ahmad Maroof Karimi, Feiyi Wang, Ali Butt

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

I/O constitutes a significant portion of most of the application run-time. Spawning many such applications concurrently on an HPC system leads to severe I/O contention. Thus, understanding and subsequently reducing I/O contention induced by such multi-tenancy is critical for the efficient and reliable performance of the HPC system. In this study, we demonstrate that an application's performance is influenced by the command line arguments passed to the job submission. We model an application's I/O behavior based on two factors: past I/O behavior within a time window and user-configured I/O settings via command-line arguments. We conclude that I/O patterns for well-known HPC applications like E3SM and LAMMP are predictable, with an average uncertainty below 0.25 (A probability of 80%) and near zero (A probability of 100%) within a day. However, I/O pattern variance increases as the study time window lengthens. Additionally, we show that for 38 users and at least 50 applications constituting approximately 93000 job submissions, there is a high correlation between a submitted command line and the past command lines made within 1 to 10 days submitted by the user. We claim the length of this time window is unique per user.

Original languageEnglish
Title of host publicationICDCN 2025 - Proceedings of the 26th International Conference on Distributed Computing and Networking
PublisherAssociation for Computing Machinery, Inc
Pages181-190
Number of pages10
ISBN (Electronic)9798400710629
DOIs
StatePublished - Jan 4 2025
Event26th International Conference on Distributed Computing and Networking, ICDCN 2025 - Hyderabad, India
Duration: Jan 4 2025Jan 7 2025

Publication series

NameICDCN 2025 - Proceedings of the 26th International Conference on Distributed Computing and Networking

Conference

Conference26th International Conference on Distributed Computing and Networking, ICDCN 2025
Country/TerritoryIndia
CityHyderabad
Period01/4/2501/7/25

Funding

We thank our anonymous reviewers for their detailed feedback and valuable suggestions. This work is sponsored in part by the NSF under the grants: CSR-2106634, CCF-1919113/1919075, CNS2045680, OAC-2004751, and OAC-2106446, Office of Science of the U.S. Department of Energy under the grant DE-AC05-00OR22725, SERB, Govt. of India Start-up Research grant SRG/2023/002445, BITS CRDF under grant C1/23/173 and BITS Pilani under the grants: BBF/BITS(G)/FY2022-23/BCPS-123/24-25/R1 and GOA/ACG/2022- 2023/Oct/11. Results presented in this paper were obtained using the OLCF at Oak Ridge National Laboratory.

Keywords

  • Darshan
  • High Performance Computing
  • I/O characterization
  • I/O profiling
  • I/O scheduler

Fingerprint

Dive into the research topics of 'User-based I/O Profiling for Leadership Scale HPC Workloads'. Together they form a unique fingerprint.

Cite this