Domain-Specific Type-Safe APIs for Hierarchical Scientific Data with Modern C++

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

General-purpose library application programming interfaces (APIs) for self-describing hierarchical scientific data storage, such as the HDF5 and NetCDF libraries, are traditionally of runtime nature. Runtime errors for entry existence and data types are typically caught later in the development process of higher-level application-specific APIs. In this paper, we propose exploiting modern C++ metaprogramming features to add compile-time type-safety to improve the interaction with a well-defined metadata-rich scientific schema in domain-specific hierarchical datasets. We tackle two aspects of common use: (i) direct data access, (ii) flexible “in-memory” index models for efficient search and data processing. The proposed APIs use C++17’s template type auto deduction features, C++11’s enum class for type-safety and C-style preprocessor macros for generative templated code. We showcase the pros and cons of our initial work on the standard NeXus schema used for annotating and storing experimental neutron scattering data at several facilities around the world on top of HDF5. Extendable compile-time type-safe APIs are a desirable feature that could be indexed by any modern integrated development environment (IDE). Hence, such APIs can help ease the learning curve for domain scientists using a less error-prone software interaction to enhance the findability of their data without resorting to a domain-specific language (DSL).

Original languageEnglish
Title of host publicationResponsible Data Science - Select Proceedings of ICDSE 2021
EditorsJimson Mathew, G. Santhosh Kumar, Deepak Padmanabhan, Joemon M. Jose
PublisherSpringer Science and Business Media Deutschland GmbH
Pages191-204
Number of pages14
ISBN (Print)9789811944529
DOIs
StatePublished - 2022
Event7th International Conference on Data Science and Engineering, ICDSE 2021 - Patna, India
Duration: Dec 17 2021Dec 18 2021

Publication series

NameLecture Notes in Electrical Engineering
Volume940
ISSN (Print)1876-1100
ISSN (Electronic)1876-1119

Conference

Conference7th International Conference on Data Science and Engineering, ICDSE 2021
Country/TerritoryIndia
CityPatna
Period12/17/2112/18/21

Funding

Acknowledgements Work at Oak Ridge National Laboratory was sponsored by the Division of Scientific User Facilities, Office of Basic Energy Sciences, US Department of Energy, under Contract no. DE-AC05-00OR22725 with UT-Battelle, LLC.

FundersFunder number
U.S. Department of EnergyDE-AC05-00OR22725
Basic Energy Sciences

    Keywords

    • C++
    • FAIR scientific data
    • HDF5
    • Template metaprogramming
    • Type-safe API

    Fingerprint

    Dive into the research topics of 'Domain-Specific Type-Safe APIs for Hierarchical Scientific Data with Modern C++'. Together they form a unique fingerprint.

    Cite this