ISABELA-QA: Query-driven analytics with ISABELA-compressed extreme-scale scientific data

Sriram Lakshminarasimhan, John Jenkins, Isha Arkatkar, Zhenhuan Gong, Hemanth Kolla, Seung Hoe Ku, Stephane Ethier, Jackie Chen, C. S. Chang, Scott Klasky, Robert Latham, Robert Ross, Nagiza F. Samatova

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

46 Scopus citations

Abstract

Efficient analytics of scientific data from extreme-scale simulations is quickly becoming a top-notch priority. The increasing simulation output data sizes demand for a paradigm shift in how analytics is conducted. In this paper, we argue that query-driven analytics over compressed-rather than original, full-size-data is a promising strategy in order to meet storage-and-I/O-bound application challenges. As a proof-of-principle, we propose a parallel query processing engine, called ISABELA-QA that is designed and optimized for knowledge priors driven analytical processing of spatiotemporal, multivariate scientific data that is initially compressed, in situ, by our ISABELA technology. With ISABELA-QA, the total data storage requirement is less than 23%-30% of the original data, which is upto eight-fold less than what the existing state-of-the-art data management technologies that require storing both the original data and the index could offer. Since ISABELA-QA operates on the metadata generated by our compression technology, its underlying indexing technology for efficient query processing is light-weight; it requires less than 3% of the original data, unlike existing database indexing approaches that require 30%-300% of the original data. Moreover, ISABELA-QA is specifically optimized to retrieve the actual values rather than spatial regions for the variables that satisfy user-specified range queries-a functionality that is critical for high-accuracy data analytics. To the best of our knowledge, this is the first techology that enables query-driven analytics over the compressed spatio-temporal floating-point double-or single-precision data, while offering a light-weight memory and disk storage footprint solution with parallel, scalable, multi-node, multi-core, GPU-based query processing.

Original languageEnglish
Title of host publicationProceedings of 2011 SC - International Conference for High Performance Computing, Networking, Storage and Analysis
DOIs
StatePublished - 2011
Event2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC11 - Seattle, WA, United States
Duration: Nov 12 2011Nov 18 2011

Publication series

NameProceedings of 2011 SC - International Conference for High Performance Computing, Networking, Storage and Analysis

Conference

Conference2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC11
Country/TerritoryUnited States
CitySeattle, WA
Period11/12/1111/18/11

Funding

Keywords

  • Compression
  • Data reduction
  • Data-intensive computing
  • High performance applications
  • Query-driven analytics

Fingerprint

Dive into the research topics of 'ISABELA-QA: Query-driven analytics with ISABELA-compressed extreme-scale scientific data'. Together they form a unique fingerprint.

Cite this