To Derive or Not to Derive: I/O Libraries Take Charge of Derived Quantities Computation

Ana Gainaru, Norbert Podhorszki, Liz Dulac, Qian Gong, Scott Klasky, Greg Eisenhauer, Antonios Kougkas, Xian He Sun, Jay Lofstead

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The ever-increasing volume of data produced by HPC simulations necessitates scalable methods for data exploration and knowledge extraction. Scientific data analysis often involves complex queries across distributed datasets, requiring manipulation of multiple primary variables and generating derived data that needs to be handled efficiently, creating challenges for applications that need to parse many large datasets. Relying on individual applications to handle all intermediate data generally leads to redundant computations across studies and unnecessary data transfers. In this paper, we investigate the performance of different approaches where applications define derived variables as quantities of interest (QoIs) and offload the computation and transfer of these QoIs to the I/O library. This significantly reduces redundancy and optimizes data movement across the distributed storage and processing infrastructure by allowing control over when and where derived variables are computed. We present a detailed analysis of the performance-storage trade-offs associated with different solutions and showcase results for our study on two large-scale datasets created from climate and combustion simulations.

Original languageEnglish
Title of host publicationProceedings - 2024 IEEE 36th International Symposium on Computer Architecture and High-Performance Computing, SBAC-PAD 2024
PublisherIEEE Computer Society
Pages105-115
Number of pages11
ISBN (Electronic)9798350356168
DOIs
StatePublished - 2024
Event36th IEEE International Symposium on Computer Architecture and High-Performance Computing, SBAC-PAD 2024 - Hilo, United States
Duration: Nov 13 2024Nov 15 2024

Publication series

NameProceedings - Symposium on Computer Architecture and High Performance Computing
ISSN (Print)1550-6533

Conference

Conference36th IEEE International Symposium on Computer Architecture and High-Performance Computing, SBAC-PAD 2024
Country/TerritoryUnited States
CityHilo
Period11/13/2411/15/24

Keywords

  • Derived Variables
  • HPC Analysis
  • HPC Quantities of Interest
  • Large-scale I/O
  • Queries for Scientific Data

Fingerprint

Dive into the research topics of 'To Derive or Not to Derive: I/O Libraries Take Charge of Derived Quantities Computation'. Together they form a unique fingerprint.

Cite this