Skip to main navigation Skip to search Skip to main content

ISABELA-QA: Query-driven analytics with ISABELA-compressed extreme-scale scientific data

  • Sriram Lakshminarasimhan
  • , John Jenkins
  • , Isha Arkatkar
  • , Zhenhuan Gong
  • , Hemanth Kolla
  • , Seung Hoe Ku
  • , Stephane Ethier
  • , Jackie Chen
  • , C. S. Chang
  • , Scott Klasky
  • , Robert Latham
  • , Robert Ross
  • , Nagiza F. Samatova

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

46 Scopus citations

Abstract

Efficient analytics of scientific data from extreme-scale simulations is quickly becoming a top-notch priority. The increasing simulation output data sizes demand for a paradigm shift in how analytics is conducted. In this paper, we argue that query-driven analytics over compressed-rather than original, full-size-data is a promising strategy in order to meet storage-and-I/O-bound application challenges. As a proof-of-principle, we propose a parallel query processing engine, called ISABELA-QA that is designed and optimized for knowledge priors driven analytical processing of spatiotemporal, multivariate scientific data that is initially compressed, in situ, by our ISABELA technology. With ISABELA-QA, the total data storage requirement is less than 23%-30% of the original data, which is upto eight-fold less than what the existing state-of-the-art data management technologies that require storing both the original data and the index could offer. Since ISABELA-QA operates on the metadata generated by our compression technology, its underlying indexing technology for efficient query processing is light-weight; it requires less than 3% of the original data, unlike existing database indexing approaches that require 30%-300% of the original data. Moreover, ISABELA-QA is specifically optimized to retrieve the actual values rather than spatial regions for the variables that satisfy user-specified range queries-a functionality that is critical for high-accuracy data analytics. To the best of our knowledge, this is the first techology that enables query-driven analytics over the compressed spatio-temporal floating-point double-or single-precision data, while offering a light-weight memory and disk storage footprint solution with parallel, scalable, multi-node, multi-core, GPU-based query processing.

Original languageEnglish
Title of host publicationProceedings of 2011 SC - International Conference for High Performance Computing, Networking, Storage and Analysis
PublisherAssociation for Computing Machinery
ISBN (Print)9781450307710
DOIs
StatePublished - Nov 12 2011
Event2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2011 - Seattle, WA, United States
Duration: Nov 12 2011Nov 18 2011

Publication series

NameProceedings of 2011 SC - International Conference for High Performance Computing, Networking, Storage and Analysis

Conference

Conference2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2011
Country/TerritoryUnited States
CitySeattle, WA
Period11/12/1111/18/11

Keywords

  • Compression
  • Data reduction
  • Data-intensive computing
  • High performance applications
  • Query-driven analytics

Fingerprint

Dive into the research topics of 'ISABELA-QA: Query-driven analytics with ISABELA-compressed extreme-scale scientific data'. Together they form a unique fingerprint.

Cite this