Abstract
We present the design and implementation of Data Jockey, a data management system for HPC multi-tiered storage systems. As a centralized data management control plane, Data Jockey automates bulk data movement and placement for scientific workflows and integrates into existing HPC storage infrastructures. Data Jockey simplifies data management by eliminating human effort in programming complex data movements, laying datasets across multiple storage tiers when supporting complex workflows, which in turn increases the usability of multi-tiered storage systems emerging in modern HPC data centers. Specifically, Data Jockey presents a new data management scheme called “goal driven data management” that can automatically infer low-level bulk data movement plans from declarative high-level goal statements that come from the lifetime of iterative runs of scientific workflows. While doing so, Data Jockey aims to minimize data wait times by taking responsibility for datasets that are unused or to be used, and aggressively utilizing the capacity of the upper, higher performant storage tiers. We evaluated a prototype implementation of Data Jockey under a synthetic workload based on a year's worth of Oak Ridge Leadership Computing Facility's (OLCF) operational logs. Our evaluations suggest that Data Jockey leads to higher utilization of the upper storage tiers while minimizing the programming effort of data movement compared to human involved, per-domain ad-hoc data management scripts.
Original language | English |
---|---|
Title of host publication | Proceedings - 2019 IEEE 33rd International Parallel and Distributed Processing Symposium, IPDPS 2019 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 511-522 |
Number of pages | 12 |
ISBN (Electronic) | 9781728112466 |
DOIs | |
State | Published - May 2019 |
Event | 33rd IEEE International Parallel and Distributed Processing Symposium, IPDPS 2019 - Rio de Janeiro, Brazil Duration: May 20 2019 → May 24 2019 |
Publication series
Name | Proceedings - 2019 IEEE 33rd International Parallel and Distributed Processing Symposium, IPDPS 2019 |
---|
Conference
Conference | 33rd IEEE International Parallel and Distributed Processing Symposium, IPDPS 2019 |
---|---|
Country/Territory | Brazil |
City | Rio de Janeiro |
Period | 05/20/19 → 05/24/19 |
Bibliographical note
Publisher Copyright:© 2019 IEEE