Feature-preserving Lossy Compression for in Situ Data Analysis

Igor Yakushin, Kshitij Mehta, Jieyang Chen, Matthew Wolf, Ian Foster, Scott Klasky, Todd Munson

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Scopus citations

Abstract

The traditional model of having simulations write data to disk for offline analysis can be prohibitively expensive on computers with limited storage capacity or I/O bandwidth. In situ data analysis has emerged as a necessary paradigm to address this issue and is expected to play an important role in exascale computing. We demonstrate the various aspects and challenges involved in setting up a comprehensive in situ data analysis pipeline that consists of a simulation coupled with compression and feature tracking routines, a framework for assessing compression quality, a middleware library for I/O and data management, and a workflow tool for composing and running the pipeline. We perform studies of compression mechanisms and parameters on two supercomputers, Summit at Oak Ridge National Laboratory and Theta at Argonne National Laboratory, for two example application pipelines. We show that the optimal choice of compression parameters varies with data, time, and analysis, and that periodic retuning of the in situ pipeline can improve compression quality. Finally, we discuss our perspective on the wider adoption of in situ data analysis and management practices and technologies in the HPC community.

Original languageEnglish
Title of host publication49th International Conference on Parallel Processing, ICPP 2020 - Workshop Proceedings
PublisherAssociation for Computing Machinery
ISBN (Electronic)9781450388689
DOIs
StatePublished - Aug 17 2020
Event49th International Conference on Parallel Processing, ICPP Workshops 2020 - Virtual, Online, Canada
Duration: Aug 17 2020Aug 20 2020

Publication series

NameACM International Conference Proceeding Series

Conference

Conference49th International Conference on Parallel Processing, ICPP Workshops 2020
Country/TerritoryCanada
CityVirtual, Online
Period08/17/2008/20/20

Funding

This research was supported by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of U.S. Department of Energy Office of Science and the National Nuclear Security Administration. This research used resources at the Argonne and Oak Ridge Leadership Computing Facilities, DOE Office of Science User Facilities supported under Contracts DE-AC02-06CH11357 and DE-AC05-00OR22725, respectively.

FundersFunder number
DOE Office of ScienceDE-AC05-00OR22725, DE-AC02-06CH11357
U.S. Department of Energy Office of Science
National Nuclear Security Administration

    Keywords

    • Compression
    • data analysis
    • high performance
    • in situ

    Fingerprint

    Dive into the research topics of 'Feature-preserving Lossy Compression for in Situ Data Analysis'. Together they form a unique fingerprint.

    Cite this