Comparative I/O workload characterization of two leadership class storage clusters

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

33 Scopus citations

Abstract

The Oak Ridge Leadership Computing Facility (OLCF) is a leader in large-scale parallel file system development, de- sign, deployment and continuous operation. For the last decade, the OLCF has designed and deployed two large center-wide parallel file systems. The first instantiation, Spider 1, served the Jaguar supercomputer and its predecessor, Spider 2, now serves the Titan supercomputer, among many other OLCF computational resources. The OLCF has been rigorously collecting file and storage system statistics from these Spider systems since their transition to production state. In this paper we present the collected I/O workload statistics from the Spider 2 system and compare it to the Spider 1 data. Our analysis show that the Spider 2 workload is more more write-heavy I/O compared to Spider 1 (75% vs. 60%, respectively). The data also show the OLCF storage policies such as periodic purges are effectively managing the capacity resource of Spider 2. Furthermore, due to improvements in TDM multipath and ib srp software, we are utilizing the Spider 2 system bandwidth and latency resources more effectively. The Spider 2 bandwidth usage statistics shows that our system is working within the design specifications. How- ever, it is also evident that our scientific applications can be more effectively served by a burst buffer storage layer. All the data has been collected by monitoring tools developed for the Spider ecosystem. We believe the observed data set and insights will help us better design the next-generation Spider file and storage system. It will also be helpful to the larger community for building more effective large-scale file and storage systems.

Original languageEnglish
Title of host publicationProceedings of PDSW 2015
Subtitle of host publication10th Parallel Data Storage Workshop - Held in conjunction with SC 2015: The International Conference for High Performance Computing, Networking, Storage and Analysis
PublisherAssociation for Computing Machinery, Inc
Pages31-36
Number of pages6
ISBN (Electronic)9781450340083
DOIs
StatePublished - Nov 15 2015
Event10th Parallel Data Storage Workshop, PDSW 2015 - Austin, United States
Duration: Nov 16 2015 → …

Publication series

NameProceedings of PDSW 2015: 10th Parallel Data Storage Workshop - Held in conjunction with SC 2015: The International Conference for High Performance Computing, Networking, Storage and Analysis

Conference

Conference10th Parallel Data Storage Workshop, PDSW 2015
Country/TerritoryUnited States
CityAustin
Period11/16/15 → …

Fingerprint

Dive into the research topics of 'Comparative I/O workload characterization of two leadership class storage clusters'. Together they form a unique fingerprint.

Cite this