Analytics-driven lossless data compression for rapid in-situ indexing, storing, and querying

  • John Jenkins
  • , Isha Arkatkar
  • , Sriram Lakshminarasimhan
  • , Neil Shah
  • , Eric R. Schendel
  • , Stephane Ethier
  • , Choong Seock Chang
  • , Jacqueline H. Chen
  • , Hemanth Kolla
  • , Scott Klasky
  • , Robert Ross
  • , Nagiza F. Samatova

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

5 Scopus citations

Abstract

The analysis of scientific simulations is highly data-intensive and is becoming an increasingly important challenge. Peta-scale data sets require the use of light-weight query-driven analysis methods, as opposed to heavy-weight schemes that optimize for speed at the expense of size. This paper is an attempt in the direction of query processing over losslessly compressed scientific data. We propose a co-designed double-precision compression and indexing methodology for range queries by performing unique-value-based binning on the most significant bytes of double precision data (sign, exponent, and most significant mantissa bits), and inverting the resulting metadata to produce an inverted index over a reduced data representation. Without the inverted index, our method matches or improves compression ratios over both general-purpose and floating-point compression utilities. The inverted index is light-weight, and the overall storage requirement for both reduced column and index is less than 135%, whereas existing DBMS technologies can require 200-400%. As a proof-of-concept, we evaluate univariate range queries that additionally return column values, a critical component of data analytics, against state-of-the-art bitmap indexing technology, showing multi-fold query performance improvements.

Original languageEnglish
Title of host publicationDatabase and Expert Systems Applications - 23rd International Conference, DEXA 2012, Proceedings
PublisherSpringer Verlag
Pages16-30
Number of pages15
EditionPART 2
ISBN (Print)9783642325960
DOIs
StatePublished - 2012
Event23rd International Conference on Database and Expert Systems Applications, DEXA 2012 - Vienna, Austria
Duration: Sep 3 2012Sep 6 2012

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
NumberPART 2
Volume7447 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference23rd International Conference on Database and Expert Systems Applications, DEXA 2012
Country/TerritoryAustria
CityVienna
Period09/3/1209/6/12

Fingerprint

Dive into the research topics of 'Analytics-driven lossless data compression for rapid in-situ indexing, storing, and querying'. Together they form a unique fingerprint.

Cite this