Scientific User Behavior and Data-Sharing Trends in a Petascale File System

Seung Hwan Lim, Hyogi Sim, Raghul Gunasekaran, Sudharshan S. Vazhkudai

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

The Oak Rrdge Leadership Computing Facility (OLCF) runs the No. 4 supercomputer in the world, supported by a petascale file system, to facilitate scientific discovery. In this paper, using the daily file system metadata snapshots collected over 500 days, we have studied the behavioral trends of 1,362 active users and 380 projects across 35 science domains. In particular, we have analyzed both individual and collective behavior of users and projects, highlighting needs from individual communities and the overall requirements to operate the file system. We have analyzed the metadata across three dimensions, namely (i) the projects' file generation and usage trends, using quantitative file system-centric metrics, (ii) scientific user behavior on the file system, and (iii) the data sharing trends of users and projects. To the best of our knowledge, our work is the first of its kind to provide comprehensive insights on user behavior from multiple science domains through metadata analysis of a large-scale shared file system. We envision that this OLCF case study will provide valuable insights for the design, operation, and management of storage systems at scale, and also encourage other HPC centers to undertake similar such efforts.CCS

Original languageEnglish
Title of host publicationSC 2017 - International Conference for High Performance Computing, Networking, Storage and Analysis
PublisherIEEE Computer Society
ISBN (Electronic)9781450351140
DOIs
StatePublished - 2017
Event2017 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2017 - Denver, United States
Duration: Nov 12 2017Nov 17 2017

Publication series

NameInternational Conference for High Performance Computing, Networking, Storage and Analysis, SC
Volume2017-November
ISSN (Print)2167-4329
ISSN (Electronic)2167-4337

Conference

Conference2017 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2017
Country/TerritoryUnited States
CityDenver
Period11/12/1711/17/17

Funding

We would like to thank our shepherd, Neil Chue Hong, for his feedback. The work was supported by, and used the resources of, the Oak Ridge Leadership Computing Facility, located in the National Center for Computational Sciences at ORNL, which is managed by UT Battelle, LLC for the U.S. DOE (under the contract No. DE-AC05-00OR22725).

FundersFunder number
U.S. Department of EnergyDE-AC05-00OR22725

    Keywords

    • Distributed file systems
    • Usage measurement

    Fingerprint

    Dive into the research topics of 'Scientific User Behavior and Data-Sharing Trends in a Petascale File System'. Together they form a unique fingerprint.

    Cite this