TY - GEN
T1 - A General Framework for Error-controlled Unstructured Scientific Data Compression
AU - Gong, Qian
AU - Wang, Zhe
AU - Reshniak, Viktor
AU - Liang, Xin
AU - Chen, Jieyang
AU - Liu, Qing
AU - Athawale, Tushar M.
AU - Ju, Yi
AU - Rangarajan, Anand
AU - Ranka, Sanjay
AU - Podhorszki, Norbert
AU - Archibald, Rick
AU - Klasky, Scott
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Data compression plays a key role in reducing storage and I/O costs. Traditional lossy methods primarily target data on rectilinear grids and cannot leverage the spatial coherence in unstructured mesh data, leading to suboptimal compression ratios. We present a multi-component, error-bounded compression framework designed to enhance the compression of floating-point unstructured mesh data, which is common in scientific applications. Our approach involves interpolating mesh data onto a rectilinear grid and then separately compressing the grid interpolation and the interpolation residuals. This method is general, independent of mesh types and typologies, and can be seamlessly integrated with existing lossy compressors for improved performance. We evaluated our framework across twelve variables from two synthetic datasets and two real-world simulation datasets. The results indicate that the multi-component framework consistently outperforms state-of-the-art lossy compressors on unstructured data, achieving, on average, a 2.3 - 3.5× improvement in compression ratios, with error bounds ranging from 1 × 10 the -6 to 1×10-2. We further investigate impact of hyperparameters, such as grid spacing and error allocation, to deliver optimal compression ratios in diverse datasets.
AB - Data compression plays a key role in reducing storage and I/O costs. Traditional lossy methods primarily target data on rectilinear grids and cannot leverage the spatial coherence in unstructured mesh data, leading to suboptimal compression ratios. We present a multi-component, error-bounded compression framework designed to enhance the compression of floating-point unstructured mesh data, which is common in scientific applications. Our approach involves interpolating mesh data onto a rectilinear grid and then separately compressing the grid interpolation and the interpolation residuals. This method is general, independent of mesh types and typologies, and can be seamlessly integrated with existing lossy compressors for improved performance. We evaluated our framework across twelve variables from two synthetic datasets and two real-world simulation datasets. The results indicate that the multi-component framework consistently outperforms state-of-the-art lossy compressors on unstructured data, achieving, on average, a 2.3 - 3.5× improvement in compression ratios, with error bounds ranging from 1 × 10 the -6 to 1×10-2. We further investigate impact of hyperparameters, such as grid spacing and error allocation, to deliver optimal compression ratios in diverse datasets.
KW - error-control
KW - multi-components
KW - unstructured data compression
UR - http://www.scopus.com/inward/record.url?scp=85205997001&partnerID=8YFLogxK
U2 - 10.1109/e-Science62913.2024.10678699
DO - 10.1109/e-Science62913.2024.10678699
M3 - Conference contribution
AN - SCOPUS:85205997001
T3 - Proceedings - 2024 IEEE 20th International Conference on e-Science, e-Science 2024
BT - Proceedings - 2024 IEEE 20th International Conference on e-Science, e-Science 2024
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 20th IEEE International Conference on e-Science, e-Science 2024
Y2 - 16 September 2024 through 20 September 2024
ER -