TY - GEN
T1 - Online and Scalable Data Compression Pipeline with Guarantees on Quantities of Interest
AU - Banerjee, Tania
AU - Lee, Jaemoon
AU - Choi, Jong
AU - Gong, Qian
AU - Chen, Jieyang
AU - Chang, Choongseok
AU - Klasky, Scott
AU - Rangarajan, Anand
AU - Ranka, Sanjay
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Data compression is becoming critical for data-intensive scientific applications. Scientists require compression techniques that accurately preserve derived quantities of interest (QoIs). Prior work has shown that a pipeline can be built to guarantee error on the primary data (PD) within user-defined bounds and achieve near-floating point QoI errors. In this paper, we present novel computational approaches for accelerating the pipeline and demonstrate results that enable concurrent execution of compression in parallel with the simulation nodes. This allows compression, including the writing of the required compression data, for the previous time step to be completed while the simulation proceeds with the current time step. Overall, the approach presented in this paper results in a 6-8 times improvement in computational overhead compared to previous work. These results were obtained using data generated by a large-scale fusion code called XGC, which produces hundreds of terabytes of data in a single day.
AB - Data compression is becoming critical for data-intensive scientific applications. Scientists require compression techniques that accurately preserve derived quantities of interest (QoIs). Prior work has shown that a pipeline can be built to guarantee error on the primary data (PD) within user-defined bounds and achieve near-floating point QoI errors. In this paper, we present novel computational approaches for accelerating the pipeline and demonstrate results that enable concurrent execution of compression in parallel with the simulation nodes. This allows compression, including the writing of the required compression data, for the previous time step to be completed while the simulation proceeds with the current time step. Overall, the approach presented in this paper results in a 6-8 times improvement in computational overhead compared to previous work. These results were obtained using data generated by a large-scale fusion code called XGC, which produces hundreds of terabytes of data in a single day.
UR - http://www.scopus.com/inward/record.url?scp=85174302826&partnerID=8YFLogxK
U2 - 10.1109/e-Science58273.2023.10254934
DO - 10.1109/e-Science58273.2023.10254934
M3 - Conference contribution
AN - SCOPUS:85174302826
T3 - Proceedings 2023 IEEE 19th International Conference on e-Science, e-Science 2023
BT - Proceedings 2023 IEEE 19th International Conference on e-Science, e-Science 2023
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 19th IEEE International Conference on e-Science, e-Science 2023
Y2 - 9 October 2023 through 14 October 2023
ER -