Using Balanced Data Placement to Address I/O Contention in Production Environments

Sarah Neuwirth, Feiyi Wang, Sarp Oral, Sudharshan Vazhkudai, James Rogers, Ulrich Bruening

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

11 Scopus citations

Abstract

Designed for capacity and capability, HPC I/O systems are inherently complex and shared among multiple, concurrent jobs competing for resources. Lack of centralized coordination and control often render the end-to-end I/O paths vulnerable to load imbalance and contention. With the emergence of data-intensive HPC applications, storage systems are further contended for performance and scalability. This paper proposes to unify two key approaches to tackle the imbalanced use of I/O resources and to achieve an end-to-end I/O performance improvement in the most transparent way. First, it utilizes a topology-aware, Balanced Placement I/O method (BPIO) for mitigating resource contention. Second, it takes advantage of the platform-neutral ADIOS middleware, which provides a flexible I/O mechanism for scientific applications. By integrating BPIO with ADIOS, referred to as Aequilibro, we obtain an end-to-end and per job I/O performance improvement for ADIOS-enabled HPC applications without requiring any code changes. Aequilibro can be applied to almost any HPC platform and is mostly suitable for systems that lack a centralized file system resource manager. We demonstrate the effectiveness of our integration on the Titan system at the Oak Ridge National Laboratory. Our experiments with a synthetic benchmark and real-world HPC workload show that, even in a noisy production environment, Aequilibro can improve large-scale application performance significantly.

Original languageEnglish
Title of host publicationProceedings - 28th IEEE International Symposium on Computer Architecture and High Performance Computing, SBAC-PAD 2016
PublisherIEEE Computer Society
Pages9-17
Number of pages9
ISBN (Electronic)9781509061082
DOIs
StatePublished - Dec 16 2016
Event28th IEEE International Symposium on Computer Architecture and High Performance Computing, SBAC-PAD 2016 - Los Angeles, United States
Duration: Oct 26 2016Oct 28 2016

Publication series

NameProceedings - Symposium on Computer Architecture and High Performance Computing
ISSN (Print)1550-6533

Conference

Conference28th IEEE International Symposium on Computer Architecture and High Performance Computing, SBAC-PAD 2016
Country/TerritoryUnited States
CityLos Angeles
Period10/26/1610/28/16

Keywords

  • High Performance Computing
  • Load Balancing
  • Parallel File System
  • Performance Evaluation

Fingerprint

Dive into the research topics of 'Using Balanced Data Placement to Address I/O Contention in Production Environments'. Together they form a unique fingerprint.

Cite this