TY - JOUR
T1 - ISABELA for effective in situ compression of scientific data
AU - Lakshminarasimhan, Sriram
AU - Shah, Neil
AU - Ethier, Stephane
AU - Ku, Seung Hoe
AU - Chang, C. S.
AU - Klasky, Scott
AU - Latham, Rob
AU - Ross, Rob
AU - Samatova, Nagiza F.
PY - 2013/2
Y1 - 2013/2
N2 - Exploding dataset sizes from extreme-scale scientific simulations necessitates efficient data management and reduction schemes to mitigate I/O costs. With the discrepancy between I/O bandwidth and computational power, scientists are forced to capture data infrequently, thereby making data collection an inherently lossy process. Although data compression can be an effective solution, the random nature of real-valued scientific datasets renders lossless compression routines ineffective. These techniques also impose significant overhead during decompression, making them unsuitable for data analysis and visualization, which require repeated data access.To address this problem, we propose an effective method for In situ Sort-And-B-spline Error-bounded Lossy Abatement (ISABELA) of scientific data that is widely regarded as effectively incompressible. With ISABELA, we apply a pre-conditioner to seemingly random and noisy data along spatial resolution to achieve an accurate fitting model that guarantees a ≥0.99 correlation with the original data. We further take advantage of temporal patterns in scientific data to compress data by ≈ 85%, while introducing only a negligible overhead on simulations in terms of runtime. ISABELA significantly outperforms existing lossy compression methods, such as wavelet compression, in terms of data reduction and accuracy.We extend upon our previous paper by additionally building a communication-free, scalable parallel storage framework on top of ISABELA-compressed data that is ideally suited for extreme-scale analytical processing. The basis for our storage framework is an inherently local decompression method (it need not decode the entire data), which allows for random access decompression and low-overhead task division that can be exploited over heterogeneous architectures. Furthermore, analytical operations such as correlation and query processing run quickly and accurately over data in the compressed space.
AB - Exploding dataset sizes from extreme-scale scientific simulations necessitates efficient data management and reduction schemes to mitigate I/O costs. With the discrepancy between I/O bandwidth and computational power, scientists are forced to capture data infrequently, thereby making data collection an inherently lossy process. Although data compression can be an effective solution, the random nature of real-valued scientific datasets renders lossless compression routines ineffective. These techniques also impose significant overhead during decompression, making them unsuitable for data analysis and visualization, which require repeated data access.To address this problem, we propose an effective method for In situ Sort-And-B-spline Error-bounded Lossy Abatement (ISABELA) of scientific data that is widely regarded as effectively incompressible. With ISABELA, we apply a pre-conditioner to seemingly random and noisy data along spatial resolution to achieve an accurate fitting model that guarantees a ≥0.99 correlation with the original data. We further take advantage of temporal patterns in scientific data to compress data by ≈ 85%, while introducing only a negligible overhead on simulations in terms of runtime. ISABELA significantly outperforms existing lossy compression methods, such as wavelet compression, in terms of data reduction and accuracy.We extend upon our previous paper by additionally building a communication-free, scalable parallel storage framework on top of ISABELA-compressed data that is ideally suited for extreme-scale analytical processing. The basis for our storage framework is an inherently local decompression method (it need not decode the entire data), which allows for random access decompression and low-overhead task division that can be exploited over heterogeneous architectures. Furthermore, analytical operations such as correlation and query processing run quickly and accurately over data in the compressed space.
KW - B-spline
KW - data-intensive application
KW - high performance computing
KW - in situ processing
KW - lossy compression
UR - http://www.scopus.com/inward/record.url?scp=84874105831&partnerID=8YFLogxK
U2 - 10.1002/cpe.2887
DO - 10.1002/cpe.2887
M3 - Article
AN - SCOPUS:84874105831
SN - 1532-0626
VL - 25
SP - 524
EP - 540
JO - Concurrency and Computation: Practice and Experience
JF - Concurrency and Computation: Practice and Experience
IS - 4
ER -