TY - JOUR
T1 - ISOBAR preconditioner for effective and high-throughput lossless data compression
AU - Schendel, Eric R.
AU - Jin, Ye
AU - Shah, Neil
AU - Chen, Jackie
AU - Chang, C. S.
AU - Ku, Seung Hoe
AU - Ethier, Stephane
AU - Klasky, Scott
AU - Latham, Robert
AU - Ross, Robert
AU - Samatova, Nagiza F.
PY - 2012
Y1 - 2012
N2 - Efficient handling of large volumes of data is a necessity for exascale scientific applications and database systems. To address the growing imbalance between the amount of available storage and the amount of data being produced by high speed (FLOPS) processors on the system, data must be compressed to reduce the total amount of data placed on the file systems. General-purpose loss less compression frameworks, such as zlib and bzlib2, are commonly used on datasets requiring loss less compression. Quite often, however, many scientific data sets compress poorly, referred to as hard-to-compress datasets, due to the negative impact of highly entropic content represented within the data. An important problem in better loss less data compression is to identify the hard-to-compress information and subsequently optimize the compression techniques at the byte-level. To address this challenge, we introduce the In-Situ Orthogonal Byte Aggregate Reduction Compression (ISOBAR-compress) methodology as a preconditioner of loss less compression to identify and optimize the compression efficiency and throughput of hard-to-compress datasets.
AB - Efficient handling of large volumes of data is a necessity for exascale scientific applications and database systems. To address the growing imbalance between the amount of available storage and the amount of data being produced by high speed (FLOPS) processors on the system, data must be compressed to reduce the total amount of data placed on the file systems. General-purpose loss less compression frameworks, such as zlib and bzlib2, are commonly used on datasets requiring loss less compression. Quite often, however, many scientific data sets compress poorly, referred to as hard-to-compress datasets, due to the negative impact of highly entropic content represented within the data. An important problem in better loss less data compression is to identify the hard-to-compress information and subsequently optimize the compression techniques at the byte-level. To address this challenge, we introduce the In-Situ Orthogonal Byte Aggregate Reduction Compression (ISOBAR-compress) methodology as a preconditioner of loss less compression to identify and optimize the compression efficiency and throughput of hard-to-compress datasets.
UR - http://www.scopus.com/inward/record.url?scp=84864224817&partnerID=8YFLogxK
U2 - 10.1109/ICDE.2012.114
DO - 10.1109/ICDE.2012.114
M3 - Conference article
AN - SCOPUS:84864224817
SN - 1084-4627
SP - 138
EP - 149
JO - Proceedings - International Conference on Data Engineering
JF - Proceedings - International Conference on Data Engineering
M1 - 6228079
T2 - IEEE 28th International Conference on Data Engineering, ICDE 2012
Y2 - 1 April 2012 through 5 April 2012
ER -