TY - GEN
T1 - ALACRITY
T2 - Analytics-driven lossless data compression for rapid in-situ indexing, storing, and querying
AU - Jenkins, John
AU - Arkatkar, Isha
AU - Lakshminarasimhan, Sriram
AU - Boyuka, David A.
AU - Schendel, Eric R.
AU - Shah, Neil
AU - Ethier, Stephane
AU - Chang, Choong Seock
AU - Chen, Jackie
AU - Kolla, Hemanth
AU - Klasky, Scott
AU - Ross, Robert
AU - Samatova, Nagiza F.
PY - 2013
Y1 - 2013
N2 - High-performance computing architectures face nontrivial data processing challenges, as computational and I/O components further diverge in performance trajectories. For scientific data analysis in particular, methods based on generating heavyweight access acceleration structures, e.g. indexes, are becoming less feasible for ever-increasing dataset sizes. We present ALACRITY, demonstrating the effectiveness of a fused data and index encoding of scientific, floating-point data in generating lightweight data structures amenable to common types of queries used in scientific data analysis. We exploit the representation of floating-point values by extracting significant bytes, using the resulting unique values to bin the remaining data along fixed-precision boundaries. To optimize query processing, we use an inverted index, mapping each generated bin to a list of records contained within, allowing us to optimize query processing with attribute range constraints. Overall, the storage footprint for both index and data is shown to be below numerous configurations of bitmap indexing, while matching or outperforming query performance.
AB - High-performance computing architectures face nontrivial data processing challenges, as computational and I/O components further diverge in performance trajectories. For scientific data analysis in particular, methods based on generating heavyweight access acceleration structures, e.g. indexes, are becoming less feasible for ever-increasing dataset sizes. We present ALACRITY, demonstrating the effectiveness of a fused data and index encoding of scientific, floating-point data in generating lightweight data structures amenable to common types of queries used in scientific data analysis. We exploit the representation of floating-point values by extracting significant bytes, using the resulting unique values to bin the remaining data along fixed-precision boundaries. To optimize query processing, we use an inverted index, mapping each generated bin to a list of records contained within, allowing us to optimize query processing with attribute range constraints. Overall, the storage footprint for both index and data is shown to be below numerous configurations of bitmap indexing, while matching or outperforming query performance.
UR - http://www.scopus.com/inward/record.url?scp=84892733331&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-41221-9_4
DO - 10.1007/978-3-642-41221-9_4
M3 - Conference contribution
AN - SCOPUS:84892733331
SN - 9783642412202
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 95
EP - 114
BT - Transactions on Large-Scale Data- and Knowledge-Centered Systems X
PB - Springer Verlag
ER -