TY - GEN
T1 - Tango
T2 - 2024 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2024
AU - Qiao, Zhenbo
AU - Tian, Qirui
AU - Qin, Zhenlu
AU - Wang, Jinzhen
AU - Liu, Qin G.
AU - Podhorszki, Norbert
AU - Klasky, Scott
AU - Zhu, Hongjian
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - As simulation-based scientific discovery advances to exascale, a major question that the community is striving to answer is how to co-design data storage and complex physicsrich analytics in a way that the time to knowledge can be minimized for post-processing. A particular challenge is how to accommodate a broad spectrum of data analytics needsparticularly those that become clear only until very late during the post-processing, a scenario where existing methods, such as in situ processing, are unable or less effective in supporting data analytics. As HPC storage systems have become deeper and more complex with the recent addition of NVMe, die-stacked memory, and burst buffer, it requires fundamentally rethinking new paradigms and methods for data storage and analysis. This paper aims to address the issue of I/O interference for data analytics over local ephemeral storage, which is shared by multiple applications in a non-exclusive node usage scenario-often configured for small- to medium-sized clusters. At the core of this work is a coordinated cross-layer approach that reacts to storage interference from both storage and application layers. By decomposing and distributing analysis data across the storage hierarchy, data analytics can adapt to the interference by reducing or completely avoiding access to lower tiers whenever there is a high interference, while maintaining a prescribed error bound to limit the information loss. Meanwhile, proper actions are also taken at the storage layer to ensure sufficient bandwidth is allocated for retrieving an augmentation, which is based upon the cardinality and accuracy of the augmentation as well as the nature of an application. We evaluate three realworld data analytics, XGC, GenASiS, and CFD, on Chameleon, and quantitatively demonstrate that the I/O performance can be vastly improved, e.g., by 52% versus no adaptivity and 36% versus single-layer adaptivity, while maintaining acceptable outcomes of data analysis.
AB - As simulation-based scientific discovery advances to exascale, a major question that the community is striving to answer is how to co-design data storage and complex physicsrich analytics in a way that the time to knowledge can be minimized for post-processing. A particular challenge is how to accommodate a broad spectrum of data analytics needsparticularly those that become clear only until very late during the post-processing, a scenario where existing methods, such as in situ processing, are unable or less effective in supporting data analytics. As HPC storage systems have become deeper and more complex with the recent addition of NVMe, die-stacked memory, and burst buffer, it requires fundamentally rethinking new paradigms and methods for data storage and analysis. This paper aims to address the issue of I/O interference for data analytics over local ephemeral storage, which is shared by multiple applications in a non-exclusive node usage scenario-often configured for small- to medium-sized clusters. At the core of this work is a coordinated cross-layer approach that reacts to storage interference from both storage and application layers. By decomposing and distributing analysis data across the storage hierarchy, data analytics can adapt to the interference by reducing or completely avoiding access to lower tiers whenever there is a high interference, while maintaining a prescribed error bound to limit the information loss. Meanwhile, proper actions are also taken at the storage layer to ensure sufficient bandwidth is allocated for retrieving an augmentation, which is based upon the cardinality and accuracy of the augmentation as well as the nature of an application. We evaluate three realworld data analytics, XGC, GenASiS, and CFD, on Chameleon, and quantitatively demonstrate that the I/O performance can be vastly improved, e.g., by 52% versus no adaptivity and 36% versus single-layer adaptivity, while maintaining acceptable outcomes of data analysis.
KW - data analysis
KW - data storage
KW - High-performance computing
UR - http://www.scopus.com/inward/record.url?scp=85214978210&partnerID=8YFLogxK
U2 - 10.1109/SC41406.2024.00020
DO - 10.1109/SC41406.2024.00020
M3 - Conference contribution
AN - SCOPUS:85214978210
T3 - International Conference for High Performance Computing, Networking, Storage and Analysis, SC
BT - Proceedings of SC 2024
PB - IEEE Computer Society
Y2 - 17 November 2024 through 22 November 2024
ER -