Comprehensive Measurement and Analysis of the User-Perceived I/O Performance in a Production Leadership-Class Storage System

Lipeng Wan, Matthew Wolf, Feiyi Wang, Jong Youl Choi, George Ostrouchov, Scott Klasky

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

18 Scopus citations

Abstract

With the increase of the scale and intensity of the parallel I/O workloads generated by those scientific applications running on high performance computing facilities, understanding the I/O dynamics, especially the root cause of the I/O performance variability and degradation in HPC environment, have become extremely critical to the HPC community. In this paper, we run extensive I/O measuring tests on a production leadership-class storage system to capture the performance variabilities of large-scale parallel I/O. Analyzing these results and its statistic correlation revealed some valuable insights into the characteristics of the storage system and the root cause of I/O performance variability. Further, we leverage these findings and propose an I/O middleware design refactoring which can improve the performance of the parallel I/O by optimizing the data striping and placement. Our preliminary evaluation results demonstrate the proposed approach can reduce the average per-process write latency by at least 80% and the maximum per-process write latency by at least 20%.

Original languageEnglish
Title of host publicationProceedings - IEEE 37th International Conference on Distributed Computing Systems, ICDCS 2017
EditorsKisung Lee, Ling Liu
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1022-1031
Number of pages10
ISBN (Electronic)9781538617915
DOIs
StatePublished - Jul 13 2017
Event37th IEEE International Conference on Distributed Computing Systems, ICDCS 2017 - Atlanta, United States
Duration: Jun 5 2017Jun 8 2017

Publication series

NameProceedings - International Conference on Distributed Computing Systems

Conference

Conference37th IEEE International Conference on Distributed Computing Systems, ICDCS 2017
Country/TerritoryUnited States
CityAtlanta
Period06/5/1706/8/17

Fingerprint

Dive into the research topics of 'Comprehensive Measurement and Analysis of the User-Perceived I/O Performance in a Production Leadership-Class Storage System'. Together they form a unique fingerprint.

Cite this