Abstract
As scientific simulations and experiments move toward extremely large scales and generate massive amounts of data, the data access performance of analytic applications becomes crucial. A mismatch often happens between write and read patterns of data accesses, typically resulting in poor read performance. Data layout reorganization has been used to improve the locality of data accesses. However, current data reorganizations are static and focus on generating a single (or set of) optimized layouts that rely on prior knowledge of exact future access patterns. We propose a framework that dynamically recognizes the data usage patterns, replicates the data of interest in multiple reorganized layouts that would benefit common read patterns, and makes runtime decisions on selecting a favorable layout for a given read pattern. This framework supports reading individual elements and chunks of a multi-dimensional array of variables. Our pattern-driven layout selection strategy achieves multi-fold speedups compared to reading from the original dataset.
Original language | English |
---|---|
Title of host publication | Proceedings - 2016 16th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2016 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 356-365 |
Number of pages | 10 |
ISBN (Electronic) | 9781509024520 |
DOIs | |
State | Published - Jul 18 2016 |
Event | 16th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2016 - Cartagena, Colombia Duration: May 16 2016 → May 19 2016 |
Publication series
Name | Proceedings - 2016 16th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2016 |
---|
Conference
Conference | 16th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2016 |
---|---|
Country/Territory | Colombia |
City | Cartagena |
Period | 05/16/16 → 05/19/16 |
Funding
This work is supported in part by the U.S. Department of Energy, Office of Science, Advanced Scientific Computing Research under contracts DE-AC02-05CH11231 at Lawrence Berkeley National Laboratory and DE-AC05-00OR22725 at Oak Ridge National Laboratory, and by the U.S. National Science Foundation (Expeditions in Computing and EAGER program). This research used resources from the National Energy Research Scientific Computing Center and Oak Ridge Leadership Computing Facility.
Keywords
- data access performance
- data layout reorganization
- data usage pattern