Hades: A Context-Aware Active Storage Framework for Accelerating Large-Scale Data Analysis

Jaime Cernuda, Luke Logan, Ana Gainaru, Scott Klasky, Jay Lofstead, Anthony Kougkas, Xian He Sun

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Modern simulation workflows generate and analyze massive amounts of data using I/O libraries like Adios2 and NetCDF. Although extensive work has optimized the I/O processes during the simulation phase, executing analytical queries - which often require iterative traversals of large files for insights - is cumbersome and usually constrained by low I/O performance. Instead of waiting for the analysis phase to process queries, quantities can be derived asynchronously during data production and cached, speeding up future queries. In this work, we introduce a context-aware I/O layer named 'Hades.' It is designed to efficiently derive insights from selected quantities without compromising overall workflow performance. Hades actively and asynchronously computes and stores these quantities while the data is in transit. Hades leverages a hierarchical buffering system with data access-aware prefetching to ensure quick and timely access to relevant data. It offers a flexible query interface empowering users to easily define derived quantities and provide control over data placement decisions. Hades is implemented using an Adios2 plugin engine and the Hermes buffering platform, enabling transparent use by any Adios-powered application or workflow. Experimental results demonstrate performance improvements by up to 3-4x for tested real-world scientific producer-consumer workflows.

Original languageEnglish
Title of host publicationProceedings - 2024 IEEE 24th International Symposium on Cluster, Cloud and Internet Computing, CCGrid 2024
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages577-586
Number of pages10
ISBN (Electronic)9798350395662
DOIs
StatePublished - 2024
Event24th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing, CCGrid 2024 - Philadelphia, United States
Duration: May 6 2024May 9 2024

Publication series

NameProceedings - 2024 IEEE 24th International Symposium on Cluster, Cloud and Internet Computing, CCGrid 2024

Conference

Conference24th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing, CCGrid 2024
Country/TerritoryUnited States
CityPhiladelphia
Period05/6/2405/9/24

Funding

This work is supported by the U.S. Department of Energy (DOE) under DE-SC0023263.

FundersFunder number
U.S. Department of EnergyDE-SC0023263
U.S. Department of Energy

    Keywords

    • Active Storage
    • Context Awareness
    • Data Operator
    • Hierarchical Storage
    • In-transit Computing
    • Metadata Management

    Fingerprint

    Dive into the research topics of 'Hades: A Context-Aware Active Storage Framework for Accelerating Large-Scale Data Analysis'. Together they form a unique fingerprint.

    Cite this