Fast set intersection through run-time bitmap construction over PForDelta-compressed indexes

Xiaocheng Zou, Sriram Lakshminarasimhan, David A. Boyuka, Stephen Ranshous, Houjun Tang, Scott Klasky, Nagiza F. Samatova

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Set intersection is a fundamental operation for evaluating conjunctive queries in the context of scientific data analysis. The state-of-the-art approach in performing set intersection, compressed bitmap indexing, achieves high computational efficiency because of cheap bitwise operations; however, overall efficiency is often nullified by the HPC I/O bottleneck, because compressed bitmap indexes typically exhibit a heavy storage footprint. Conversely, the recently-presented PForDelta-compressed index has been demonstrated to be storage-lightweight, but has limited performance for set intersection. Thus, a more effective set intersection approach should be efficient in both computation and I/O. Therefore, we propose a fast set intersection approach that couples the storage light-weight PForDelta indexing format with computationally-efficient bitmaps through a specialized on-the-fly conversion. The resultant challenge is to ensure this conversion process is fast enough to maintain the performance gains from both PForDelta and the bitmaps. To this end, we contribute two key enhancements to PForDelta, BitRun and BitExp, which improve bitmap conversion through bulk bit-setting and a more streamlined PForDelta decoding process, respectively. Our experimental results show that our integrated PForDelta-bitmap method speeds up conjunctive queries by up to 7.7x versus the state-of-the-art approach, while using indexes that require 15%-60% less storage in most cases.

Original languageEnglish
Title of host publicationEuro-Par 2014
Subtitle of host publicationParallel Processing - 20th International Conference, Proceedings
EditorsFernando Silva, Inês Dutra, Vítor Santos Costa
PublisherSpringer Verlag
Pages668-679
Number of pages12
ISBN (Electronic)9783319098722
ISBN (Print)9783319098722
DOIs
StatePublished - 2014
Event20th International Conference on Parallel Processing, Euro-Par 2014 - Porto, Portugal
Duration: Aug 25 2014Aug 29 2014

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume8632 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference20th International Conference on Parallel Processing, Euro-Par 2014
Country/TerritoryPortugal
CityPorto
Period08/25/1408/29/14

Fingerprint

Dive into the research topics of 'Fast set intersection through run-time bitmap construction over PForDelta-compressed indexes'. Together they form a unique fingerprint.

Cite this