MLOC: Multi-level layout optimization framework for compressed scientific data exploration with heterogeneous access patterns

Zhenhuan Gong, Terry Rogers, John Jenkins, Hemanth Kolla, Stephane Ethier, Jackie Chen, Robert Ross, Scott Klasky, Nagiza F. Samatova

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

22 Scopus citations

Abstract

The size and scope of cutting-edge scientific simulations are growing much faster than the I/O and storage capabilities of their runtime environments. The growing gap gets exacerbated by exploratory data-intensive analytics, such as querying simulation data for regions of interest with multivariate, spatio-temporal constraints. Query-driven data exploration induces heterogeneous access patterns that further stress the performance of the underlying storage system. To partially alleviate the problem, data reduction via compression and multi-resolution data extraction are becoming an integral part of I/O systems. While addressing the data size issue, these techniques introduce yet another mix of access patterns to a heterogeneous set of possibilities. Moreover, how extreme-scale datasets are partitioned into multiple files and organized on a parallel file systems augments to an already combinatorial space of possible access patterns. To address this challenge, we present MLOC, a parallel Multilevel Layout Optimization framework for Compressed scientific spatio-temporal data at extreme scale. MLOC proposes multiple fine-grained data layout optimization kernels that form a generic core from which a broader constellation of such kernels can be organically consolidated to enable an effective data exploration with various combinations of access patterns. Specifically, the kernels are optimized for access patterns induced by (a) query-driven multivariate, spatio-temporal constraints, (b) precision-driven data analytics, (c) compression-driven data reduction, (d) multi-resolution data sampling, and (e) multi-file data partitioning and organization on a parallel file system. MLOC organizes these optimization kernels within a multi-level architecture, on which all the levels can be flexibly re-ordered by user-defined priorities. When tested on query-driven exploration of compressed data, MLOC demonstrates a superior performance compared to any state-of-the-art scientific database management technologies.

Original languageEnglish
Title of host publicationProceedings - 41st International Conference on Parallel Processing, ICPP 2012
Pages239-248
Number of pages10
DOIs
StatePublished - 2012
Event41st International Conference on Parallel Processing, ICPP 2012 - Pittsburgh, PA, United States
Duration: Sep 10 2012Sep 13 2012

Publication series

NameProceedings of the International Conference on Parallel Processing
ISSN (Print)0190-3918

Conference

Conference41st International Conference on Parallel Processing, ICPP 2012
Country/TerritoryUnited States
CityPittsburgh, PA
Period09/10/1209/13/12

Fingerprint

Dive into the research topics of 'MLOC: Multi-level layout optimization framework for compressed scientific data exploration with heterogeneous access patterns'. Together they form a unique fingerprint.

Cite this