TY - GEN
T1 - Usage Pattern-Driven Dynamic Data Layout Reorganization
AU - Tang, Houjun
AU - Byna, Suren
AU - Harenberg, Steve
AU - Zou, Xiaocheng
AU - Zhang, Wenzhao
AU - Wu, Kesheng
AU - Dong, Bin
AU - Rubel, Oliver
AU - Bouchard, Kristofer
AU - Klasky, Scott
AU - Samatova, Nagiza F.
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2016/7/18
Y1 - 2016/7/18
N2 - As scientific simulations and experiments move toward extremely large scales and generate massive amounts of data, the data access performance of analytic applications becomes crucial. A mismatch often happens between write and read patterns of data accesses, typically resulting in poor read performance. Data layout reorganization has been used to improve the locality of data accesses. However, current data reorganizations are static and focus on generating a single (or set of) optimized layouts that rely on prior knowledge of exact future access patterns. We propose a framework that dynamically recognizes the data usage patterns, replicates the data of interest in multiple reorganized layouts that would benefit common read patterns, and makes runtime decisions on selecting a favorable layout for a given read pattern. This framework supports reading individual elements and chunks of a multi-dimensional array of variables. Our pattern-driven layout selection strategy achieves multi-fold speedups compared to reading from the original dataset.
AB - As scientific simulations and experiments move toward extremely large scales and generate massive amounts of data, the data access performance of analytic applications becomes crucial. A mismatch often happens between write and read patterns of data accesses, typically resulting in poor read performance. Data layout reorganization has been used to improve the locality of data accesses. However, current data reorganizations are static and focus on generating a single (or set of) optimized layouts that rely on prior knowledge of exact future access patterns. We propose a framework that dynamically recognizes the data usage patterns, replicates the data of interest in multiple reorganized layouts that would benefit common read patterns, and makes runtime decisions on selecting a favorable layout for a given read pattern. This framework supports reading individual elements and chunks of a multi-dimensional array of variables. Our pattern-driven layout selection strategy achieves multi-fold speedups compared to reading from the original dataset.
KW - data access performance
KW - data layout reorganization
KW - data usage pattern
UR - http://www.scopus.com/inward/record.url?scp=84983380872&partnerID=8YFLogxK
U2 - 10.1109/CCGrid.2016.15
DO - 10.1109/CCGrid.2016.15
M3 - Conference contribution
AN - SCOPUS:84983380872
T3 - Proceedings - 2016 16th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2016
SP - 356
EP - 365
BT - Proceedings - 2016 16th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2016
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 16th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2016
Y2 - 16 May 2016 through 19 May 2016
ER -