TY - GEN
T1 - MLOC
T2 - 41st International Conference on Parallel Processing, ICPP 2012
AU - Gong, Zhenhuan
AU - Rogers, Terry
AU - Jenkins, John
AU - Kolla, Hemanth
AU - Ethier, Stephane
AU - Chen, Jackie
AU - Ross, Robert
AU - Klasky, Scott
AU - Samatova, Nagiza F.
PY - 2012
Y1 - 2012
N2 - The size and scope of cutting-edge scientific simulations are growing much faster than the I/O and storage capabilities of their runtime environments. The growing gap gets exacerbated by exploratory data-intensive analytics, such as querying simulation data for regions of interest with multivariate, spatio-temporal constraints. Query-driven data exploration induces heterogeneous access patterns that further stress the performance of the underlying storage system. To partially alleviate the problem, data reduction via compression and multi-resolution data extraction are becoming an integral part of I/O systems. While addressing the data size issue, these techniques introduce yet another mix of access patterns to a heterogeneous set of possibilities. Moreover, how extreme-scale datasets are partitioned into multiple files and organized on a parallel file systems augments to an already combinatorial space of possible access patterns. To address this challenge, we present MLOC, a parallel Multilevel Layout Optimization framework for Compressed scientific spatio-temporal data at extreme scale. MLOC proposes multiple fine-grained data layout optimization kernels that form a generic core from which a broader constellation of such kernels can be organically consolidated to enable an effective data exploration with various combinations of access patterns. Specifically, the kernels are optimized for access patterns induced by (a) query-driven multivariate, spatio-temporal constraints, (b) precision-driven data analytics, (c) compression-driven data reduction, (d) multi-resolution data sampling, and (e) multi-file data partitioning and organization on a parallel file system. MLOC organizes these optimization kernels within a multi-level architecture, on which all the levels can be flexibly re-ordered by user-defined priorities. When tested on query-driven exploration of compressed data, MLOC demonstrates a superior performance compared to any state-of-the-art scientific database management technologies.
AB - The size and scope of cutting-edge scientific simulations are growing much faster than the I/O and storage capabilities of their runtime environments. The growing gap gets exacerbated by exploratory data-intensive analytics, such as querying simulation data for regions of interest with multivariate, spatio-temporal constraints. Query-driven data exploration induces heterogeneous access patterns that further stress the performance of the underlying storage system. To partially alleviate the problem, data reduction via compression and multi-resolution data extraction are becoming an integral part of I/O systems. While addressing the data size issue, these techniques introduce yet another mix of access patterns to a heterogeneous set of possibilities. Moreover, how extreme-scale datasets are partitioned into multiple files and organized on a parallel file systems augments to an already combinatorial space of possible access patterns. To address this challenge, we present MLOC, a parallel Multilevel Layout Optimization framework for Compressed scientific spatio-temporal data at extreme scale. MLOC proposes multiple fine-grained data layout optimization kernels that form a generic core from which a broader constellation of such kernels can be organically consolidated to enable an effective data exploration with various combinations of access patterns. Specifically, the kernels are optimized for access patterns induced by (a) query-driven multivariate, spatio-temporal constraints, (b) precision-driven data analytics, (c) compression-driven data reduction, (d) multi-resolution data sampling, and (e) multi-file data partitioning and organization on a parallel file system. MLOC organizes these optimization kernels within a multi-level architecture, on which all the levels can be flexibly re-ordered by user-defined priorities. When tested on query-driven exploration of compressed data, MLOC demonstrates a superior performance compared to any state-of-the-art scientific database management technologies.
UR - http://www.scopus.com/inward/record.url?scp=84871133248&partnerID=8YFLogxK
U2 - 10.1109/ICPP.2012.39
DO - 10.1109/ICPP.2012.39
M3 - Conference contribution
AN - SCOPUS:84871133248
SN - 9780769547961
T3 - Proceedings of the International Conference on Parallel Processing
SP - 239
EP - 248
BT - Proceedings - 41st International Conference on Parallel Processing, ICPP 2012
Y2 - 10 September 2012 through 13 September 2012
ER -