TY - GEN
T1 - ISABELA-QA
T2 - 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC11
AU - Lakshminarasimhan, Sriram
AU - Jenkins, John
AU - Arkatkar, Isha
AU - Gong, Zhenhuan
AU - Kolla, Hemanth
AU - Ku, Seung Hoe
AU - Ethier, Stephane
AU - Chen, Jackie
AU - Chang, C. S.
AU - Klasky, Scott
AU - Latham, Robert
AU - Ross, Robert
AU - Samatova, Nagiza F.
PY - 2011
Y1 - 2011
N2 - Efficient analytics of scientific data from extreme-scale simulations is quickly becoming a top-notch priority. The increasing simulation output data sizes demand for a paradigm shift in how analytics is conducted. In this paper, we argue that query-driven analytics over compressed-rather than original, full-size-data is a promising strategy in order to meet storage-and-I/O-bound application challenges. As a proof-of-principle, we propose a parallel query processing engine, called ISABELA-QA that is designed and optimized for knowledge priors driven analytical processing of spatiotemporal, multivariate scientific data that is initially compressed, in situ, by our ISABELA technology. With ISABELA-QA, the total data storage requirement is less than 23%-30% of the original data, which is upto eight-fold less than what the existing state-of-the-art data management technologies that require storing both the original data and the index could offer. Since ISABELA-QA operates on the metadata generated by our compression technology, its underlying indexing technology for efficient query processing is light-weight; it requires less than 3% of the original data, unlike existing database indexing approaches that require 30%-300% of the original data. Moreover, ISABELA-QA is specifically optimized to retrieve the actual values rather than spatial regions for the variables that satisfy user-specified range queries-a functionality that is critical for high-accuracy data analytics. To the best of our knowledge, this is the first techology that enables query-driven analytics over the compressed spatio-temporal floating-point double-or single-precision data, while offering a light-weight memory and disk storage footprint solution with parallel, scalable, multi-node, multi-core, GPU-based query processing.
AB - Efficient analytics of scientific data from extreme-scale simulations is quickly becoming a top-notch priority. The increasing simulation output data sizes demand for a paradigm shift in how analytics is conducted. In this paper, we argue that query-driven analytics over compressed-rather than original, full-size-data is a promising strategy in order to meet storage-and-I/O-bound application challenges. As a proof-of-principle, we propose a parallel query processing engine, called ISABELA-QA that is designed and optimized for knowledge priors driven analytical processing of spatiotemporal, multivariate scientific data that is initially compressed, in situ, by our ISABELA technology. With ISABELA-QA, the total data storage requirement is less than 23%-30% of the original data, which is upto eight-fold less than what the existing state-of-the-art data management technologies that require storing both the original data and the index could offer. Since ISABELA-QA operates on the metadata generated by our compression technology, its underlying indexing technology for efficient query processing is light-weight; it requires less than 3% of the original data, unlike existing database indexing approaches that require 30%-300% of the original data. Moreover, ISABELA-QA is specifically optimized to retrieve the actual values rather than spatial regions for the variables that satisfy user-specified range queries-a functionality that is critical for high-accuracy data analytics. To the best of our knowledge, this is the first techology that enables query-driven analytics over the compressed spatio-temporal floating-point double-or single-precision data, while offering a light-weight memory and disk storage footprint solution with parallel, scalable, multi-node, multi-core, GPU-based query processing.
KW - Compression
KW - Data reduction
KW - Data-intensive computing
KW - High performance applications
KW - Query-driven analytics
UR - http://www.scopus.com/inward/record.url?scp=83155160935&partnerID=8YFLogxK
U2 - 10.1145/2063384.2063425
DO - 10.1145/2063384.2063425
M3 - Conference contribution
AN - SCOPUS:83155160935
SN - 9781450307710
T3 - Proceedings of 2011 SC - International Conference for High Performance Computing, Networking, Storage and Analysis
BT - Proceedings of 2011 SC - International Conference for High Performance Computing, Networking, Storage and Analysis
Y2 - 12 November 2011 through 18 November 2011
ER -