Querying large scientific data sets with adaptable IO system ADIOS

Junmin Gu, Scott Klasky, Norbert Podhorszki, Ji Qiang, Kesheng Wu

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

15 Scopus citations

Abstract

When working with a large dataset, a relatively small fraction of data records are of interest in each analysis operation. For example, while examining a billion-particle dataset from an accelerator model, the scientists might focus on a few thousand fastest particles, or on the particle farthest from the beam center. In general, this type of selective data access is challenging because the selected data records could be anywhere in the dataset and require a significant amount of time to locate and retrieve. In this paper, we report our experience of addressing this data access challenge with the Adaptable IO System ADIOS. More specifically, we design a query interface for ADIOS to allow arbitrary combinations of range conditions on known variables, implement a number of different mechanisms for resolving these selection conditions, and devise strategies to reduce the time needed to retrieve the scattered data records. In many cases, the query mechanism can retrieve the selected data records orders of magnitude faster than the brute-force approach. Our work relies heavily on the in situ data processing feature of ADIOS to allow user functions to be executed in the data transport pipeline. This feature allows us to build indexes for efficient query processing, and to perform other intricate analyses while the data is in memory.

Original languageEnglish
Title of host publicationSupercomputing Frontiers - 4th Asian Conference, SCFA 2018, Proceedings
EditorsRio Yokota, Weigang Wu
PublisherSpringer Verlag
Pages51-69
Number of pages19
ISBN (Print)9783319699523
DOIs
StatePublished - 2018
Event4th Asian Conference on Supercomputing Frontiers, SCFA 2018 - Singapore, Singapore
Duration: Mar 26 2018Mar 29 2018

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume10776 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference4th Asian Conference on Supercomputing Frontiers, SCFA 2018
Country/TerritorySingapore
CitySingapore
Period03/26/1803/29/18

Funding

Acknowledgment. This work was supported by the Office of Advanced Scientific Computing Research, Office of Science, of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231 (for LBNL) and DE-AC05-00OR22725 Mod 877 (for ORNL). This research also used resources of the National Energy Research Scientific Computing Center supported by the same funding agency. This work was supported by the Office of Advanced Scientific Computing Research, Office of Science, of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231 (for LBNL) and DE-AC05-00OR22725 Mod 877 (for ORNL). This research also used resources of the National Energy Research Scientific Computing Center supported by the same funding agency.

Fingerprint

Dive into the research topics of 'Querying large scientific data sets with adaptable IO system ADIOS'. Together they form a unique fingerprint.

Cite this