TY - GEN
T1 - Fast set intersection through run-time bitmap construction over PForDelta-compressed indexes
AU - Zou, Xiaocheng
AU - Lakshminarasimhan, Sriram
AU - Boyuka, David A.
AU - Ranshous, Stephen
AU - Tang, Houjun
AU - Klasky, Scott
AU - Samatova, Nagiza F.
PY - 2014
Y1 - 2014
N2 - Set intersection is a fundamental operation for evaluating conjunctive queries in the context of scientific data analysis. The state-of-the-art approach in performing set intersection, compressed bitmap indexing, achieves high computational efficiency because of cheap bitwise operations; however, overall efficiency is often nullified by the HPC I/O bottleneck, because compressed bitmap indexes typically exhibit a heavy storage footprint. Conversely, the recently-presented PForDelta-compressed index has been demonstrated to be storage-lightweight, but has limited performance for set intersection. Thus, a more effective set intersection approach should be efficient in both computation and I/O. Therefore, we propose a fast set intersection approach that couples the storage light-weight PForDelta indexing format with computationally-efficient bitmaps through a specialized on-the-fly conversion. The resultant challenge is to ensure this conversion process is fast enough to maintain the performance gains from both PForDelta and the bitmaps. To this end, we contribute two key enhancements to PForDelta, BitRun and BitExp, which improve bitmap conversion through bulk bit-setting and a more streamlined PForDelta decoding process, respectively. Our experimental results show that our integrated PForDelta-bitmap method speeds up conjunctive queries by up to 7.7x versus the state-of-the-art approach, while using indexes that require 15%-60% less storage in most cases.
AB - Set intersection is a fundamental operation for evaluating conjunctive queries in the context of scientific data analysis. The state-of-the-art approach in performing set intersection, compressed bitmap indexing, achieves high computational efficiency because of cheap bitwise operations; however, overall efficiency is often nullified by the HPC I/O bottleneck, because compressed bitmap indexes typically exhibit a heavy storage footprint. Conversely, the recently-presented PForDelta-compressed index has been demonstrated to be storage-lightweight, but has limited performance for set intersection. Thus, a more effective set intersection approach should be efficient in both computation and I/O. Therefore, we propose a fast set intersection approach that couples the storage light-weight PForDelta indexing format with computationally-efficient bitmaps through a specialized on-the-fly conversion. The resultant challenge is to ensure this conversion process is fast enough to maintain the performance gains from both PForDelta and the bitmaps. To this end, we contribute two key enhancements to PForDelta, BitRun and BitExp, which improve bitmap conversion through bulk bit-setting and a more streamlined PForDelta decoding process, respectively. Our experimental results show that our integrated PForDelta-bitmap method speeds up conjunctive queries by up to 7.7x versus the state-of-the-art approach, while using indexes that require 15%-60% less storage in most cases.
UR - http://www.scopus.com/inward/record.url?scp=84958530089&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-09873-9_56
DO - 10.1007/978-3-319-09873-9_56
M3 - Conference contribution
AN - SCOPUS:84958530089
SN - 9783319098722
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 668
EP - 679
BT - Euro-Par 2014
A2 - Silva, Fernando
A2 - Dutra, Inês
A2 - Costa, Vítor Santos
PB - Springer Verlag
T2 - 20th International Conference on Parallel Processing, Euro-Par 2014
Y2 - 25 August 2014 through 29 August 2014
ER -