Improving large-scale storage system performance via topology-aware and balanced data placement

Feiyi Wang, Sarp Oral, Saurabh Gupta, Devesh Tiwari, Sudharshan S. Vazhkudai

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

17 Scopus citations

Abstract

With the advent of big data, the I/O subsystems of large-scale compute clusters are becoming a center of focus. More applications are putting greater demands on end-to-end I/O performance. These subsystems are often complex in design. They comprise of multiple hardware and software layers to cope with the increasing capacity, capability, and scalability requirements of data intensive applications. However, the sharing nature of storage resources and the intrinsic interactions across these layers make it a great challenge to realize end-to-end performance gains. This paper proposes a topology-aware strategy to balance the load across resources, to improve the per-application I/O performance. We demonstrate the effectiveness of our algorithm on an extreme-scale compute cluster, Titan, at the Oak Ridge Leadership Computing Facility (OLCF). Our experiments with both synthetic benchmarks and a real-world application show that, even under congestion, our proposed algorithm can improve large-scale application I/O performance significantly, resulting in both a reduction in application run time as well as a higher resolution of simulation run.

Original languageEnglish
Title of host publication2014 20th IEEE International Conference on Parallel and Distributed Systems, ICPADS 2014 - Proceedings
PublisherIEEE Computer Society
Pages656-663
Number of pages8
ISBN (Electronic)9781479976157
DOIs
StatePublished - 2014
Event20th IEEE International Conference on Parallel and Distributed Systems, ICPADS 2014 - Hsinchu, Taiwan, Province of China
Duration: Dec 16 2014Dec 19 2014

Publication series

NameProceedings of the International Conference on Parallel and Distributed Systems - ICPADS
Volume2015-April
ISSN (Print)1521-9097

Conference

Conference20th IEEE International Conference on Parallel and Distributed Systems, ICPADS 2014
Country/TerritoryTaiwan, Province of China
CityHsinchu
Period12/16/1412/19/14

Keywords

  • High Performance Computing
  • Parallel File System
  • Performance Evaluation
  • Storage Area Network

Fingerprint

Dive into the research topics of 'Improving large-scale storage system performance via topology-aware and balanced data placement'. Together they form a unique fingerprint.

Cite this