TY - GEN
T1 - Using Balanced Data Placement to Address I/O Contention in Production Environments
AU - Neuwirth, Sarah
AU - Wang, Feiyi
AU - Oral, Sarp
AU - Vazhkudai, Sudharshan
AU - Rogers, James
AU - Bruening, Ulrich
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2016/12/16
Y1 - 2016/12/16
N2 - Designed for capacity and capability, HPC I/O systems are inherently complex and shared among multiple, concurrent jobs competing for resources. Lack of centralized coordination and control often render the end-to-end I/O paths vulnerable to load imbalance and contention. With the emergence of data-intensive HPC applications, storage systems are further contended for performance and scalability. This paper proposes to unify two key approaches to tackle the imbalanced use of I/O resources and to achieve an end-to-end I/O performance improvement in the most transparent way. First, it utilizes a topology-aware, Balanced Placement I/O method (BPIO) for mitigating resource contention. Second, it takes advantage of the platform-neutral ADIOS middleware, which provides a flexible I/O mechanism for scientific applications. By integrating BPIO with ADIOS, referred to as Aequilibro, we obtain an end-to-end and per job I/O performance improvement for ADIOS-enabled HPC applications without requiring any code changes. Aequilibro can be applied to almost any HPC platform and is mostly suitable for systems that lack a centralized file system resource manager. We demonstrate the effectiveness of our integration on the Titan system at the Oak Ridge National Laboratory. Our experiments with a synthetic benchmark and real-world HPC workload show that, even in a noisy production environment, Aequilibro can improve large-scale application performance significantly.
AB - Designed for capacity and capability, HPC I/O systems are inherently complex and shared among multiple, concurrent jobs competing for resources. Lack of centralized coordination and control often render the end-to-end I/O paths vulnerable to load imbalance and contention. With the emergence of data-intensive HPC applications, storage systems are further contended for performance and scalability. This paper proposes to unify two key approaches to tackle the imbalanced use of I/O resources and to achieve an end-to-end I/O performance improvement in the most transparent way. First, it utilizes a topology-aware, Balanced Placement I/O method (BPIO) for mitigating resource contention. Second, it takes advantage of the platform-neutral ADIOS middleware, which provides a flexible I/O mechanism for scientific applications. By integrating BPIO with ADIOS, referred to as Aequilibro, we obtain an end-to-end and per job I/O performance improvement for ADIOS-enabled HPC applications without requiring any code changes. Aequilibro can be applied to almost any HPC platform and is mostly suitable for systems that lack a centralized file system resource manager. We demonstrate the effectiveness of our integration on the Titan system at the Oak Ridge National Laboratory. Our experiments with a synthetic benchmark and real-world HPC workload show that, even in a noisy production environment, Aequilibro can improve large-scale application performance significantly.
KW - High Performance Computing
KW - Load Balancing
KW - Parallel File System
KW - Performance Evaluation
UR - http://www.scopus.com/inward/record.url?scp=85010299091&partnerID=8YFLogxK
U2 - 10.1109/SBAC-PAD.2016.10
DO - 10.1109/SBAC-PAD.2016.10
M3 - Conference contribution
AN - SCOPUS:85010299091
T3 - Proceedings - Symposium on Computer Architecture and High Performance Computing
SP - 9
EP - 17
BT - Proceedings - 28th IEEE International Symposium on Computer Architecture and High Performance Computing, SBAC-PAD 2016
PB - IEEE Computer Society
T2 - 28th IEEE International Symposium on Computer Architecture and High Performance Computing, SBAC-PAD 2016
Y2 - 26 October 2016 through 28 October 2016
ER -