ALACRITY: Analytics-driven lossless data compression for rapid in-situ indexing, storing, and querying

John Jenkins, Isha Arkatkar, Sriram Lakshminarasimhan, David A. Boyuka, Eric R. Schendel, Neil Shah, Stephane Ethier, Choong Seock Chang, Jackie Chen, Hemanth Kolla, Scott Klasky, Robert Ross, Nagiza F. Samatova

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

13 Scopus citations

Abstract

High-performance computing architectures face nontrivial data processing challenges, as computational and I/O components further diverge in performance trajectories. For scientific data analysis in particular, methods based on generating heavyweight access acceleration structures, e.g. indexes, are becoming less feasible for ever-increasing dataset sizes. We present ALACRITY, demonstrating the effectiveness of a fused data and index encoding of scientific, floating-point data in generating lightweight data structures amenable to common types of queries used in scientific data analysis. We exploit the representation of floating-point values by extracting significant bytes, using the resulting unique values to bin the remaining data along fixed-precision boundaries. To optimize query processing, we use an inverted index, mapping each generated bin to a list of records contained within, allowing us to optimize query processing with attribute range constraints. Overall, the storage footprint for both index and data is shown to be below numerous configurations of bitmap indexing, while matching or outperforming query performance.

Original languageEnglish
Title of host publicationTransactions on Large-Scale Data- and Knowledge-Centered Systems X
Subtitle of host publicationSpecial Issue on Database and Expert-Systems Applications
PublisherSpringer Verlag
Pages95-114
Number of pages20
ISBN (Print)9783642412202
DOIs
StatePublished - 2013

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume8220
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Fingerprint

Dive into the research topics of 'ALACRITY: Analytics-driven lossless data compression for rapid in-situ indexing, storing, and querying'. Together they form a unique fingerprint.

Cite this