TY - GEN
T1 - Compressing the incompressible with ISABELA
T2 - 17th International Conference on Parallel Processing, Euro-Par 2011
AU - Lakshminarasimhan, Sriram
AU - Shah, Neil
AU - Ethier, Stephane
AU - Klasky, Scott
AU - Latham, Rob
AU - Ross, Rob
AU - Samatova, Nagiza F.
PY - 2011
Y1 - 2011
N2 - Modern large-scale scientific simulations running on HPC systems generate data in the order of terabytes during a single run. To lessen the I/O load during a simulation run, scientists are forced to capture data infrequently, thereby making data collection an inherently lossy process. Yet, lossless compression techniques are hardly suitable for scientific data due to its inherently random nature; for the applications used here, they offer less than 10% compression rate. They also impose significant overhead during decompression, making them unsuitable for data analysis and visualization that require repeated data access. To address this problem, we propose an effective method for In-situ Sort-And-B-spline Error-bounded Lossy Abatement (ISABELA) of scientific data that is widely regarded as effectively incompressible. With ISABELA, we apply a preconditioner to seemingly random and noisy data along spatial resolution to achieve an accurate fitting model that guarantees a ≥0.99 correlation with the original data. We further take advantage of temporal patterns in scientific data to compress data by ≈ 85%, while introducing only a negligible overhead on simulations in terms of runtime. ISABELA significantly outperforms existing lossy compression methods, such as Wavelet compression. Moreover, besides being a communication-free and scalable compression technique, ISABELA is an inherently local decompression method, namely it does not decode the entire data, making it attractive for random access.
AB - Modern large-scale scientific simulations running on HPC systems generate data in the order of terabytes during a single run. To lessen the I/O load during a simulation run, scientists are forced to capture data infrequently, thereby making data collection an inherently lossy process. Yet, lossless compression techniques are hardly suitable for scientific data due to its inherently random nature; for the applications used here, they offer less than 10% compression rate. They also impose significant overhead during decompression, making them unsuitable for data analysis and visualization that require repeated data access. To address this problem, we propose an effective method for In-situ Sort-And-B-spline Error-bounded Lossy Abatement (ISABELA) of scientific data that is widely regarded as effectively incompressible. With ISABELA, we apply a preconditioner to seemingly random and noisy data along spatial resolution to achieve an accurate fitting model that guarantees a ≥0.99 correlation with the original data. We further take advantage of temporal patterns in scientific data to compress data by ≈ 85%, while introducing only a negligible overhead on simulations in terms of runtime. ISABELA significantly outperforms existing lossy compression methods, such as Wavelet compression. Moreover, besides being a communication-free and scalable compression technique, ISABELA is an inherently local decompression method, namely it does not decode the entire data, making it attractive for random access.
KW - B-spline
KW - Data-intensive Application
KW - High Performance Computing
KW - In-situ Processing
KW - Lossy Compression
UR - http://www.scopus.com/inward/record.url?scp=80052343989&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-23400-2_34
DO - 10.1007/978-3-642-23400-2_34
M3 - Conference contribution
AN - SCOPUS:80052343989
SN - 9783642233999
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 366
EP - 379
BT - Euro-Par 2011 Parallel Processing - 17th International Conference, Proceedings
Y2 - 29 August 2011 through 2 September 2011
ER -