Canopus: Enabling extreme-scale data analytics on big HPC storage via progressive refactoring

Tao Lu, Eric Suchyta, Jong Choi, Norbert Podhorszki, Scott Klasky, Qing Liu, Dave Pugmire, Matthew Wolf, Mark Ainsworth

Research output: Contribution to conferencePaperpeer-review

7 Scopus citations

Abstract

High accuracy scientific simulations on high performance computing (HPC) platforms generate large amounts of data. To allow data to be efficiently analyzed, simulation outputs need to be refactored, compressed, and properly mapped onto storage tiers. This paper presents Canopus, a progressive data management framework for storing and analyzing big scientific data. Canopus allows simulation results to be refactored into a much smaller dataset along with a series of deltas with fairly low overhead. Then, the refactored data are compressed, mapped, and written onto storage tiers. For data analytics, refactored data are selectively retrieved to restore data at a specific level of accuracy that satisfies analysis requirements. Canopus enables end users to make trade-offs between analysis speed and accuracy on-the-fly. Canopus is demonstrated and thoroughly evaluated using blob detection on fusion simulation data.

Original languageEnglish
StatePublished - 2017
Event9th USENIX Workshop on Hot Topics in Storage and File Systems, HotStorage 2017, co-located with USENIX ATC 2017 - Santa Clara, United States
Duration: Jul 10 2017Jul 11 2017

Conference

Conference9th USENIX Workshop on Hot Topics in Storage and File Systems, HotStorage 2017, co-located with USENIX ATC 2017
Country/TerritoryUnited States
CitySanta Clara
Period07/10/1707/11/17

Funding

7 Acknowledgement This research was supported by DOE SIRIUS project, the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration, and Oak Ridge Leadership Computing Facility.

Fingerprint

Dive into the research topics of 'Canopus: Enabling extreme-scale data analytics on big HPC storage via progressive refactoring'. Together they form a unique fingerprint.

Cite this