Abstract
The increasing data requirements of complex models demand robust, reproducible, and transparent systems to track and prepare models' inputs. Here we describe version 1.0 of the gcamdata R package that processes raw inputs to produce the hundreds of XML files needed by the GCAM integrated human-earth systems model. It features extensive functional and unit testing, data tracing and visualization, and enforces metadata, documentation, and flexibility in its component data-processing subunits. Although this package is specific to GCAM, many of its structural pieces and approaches should be broadly applicable to, and reusable by, other complex model/data systems aiming to improve transparency, reproducibility, and flexibility.
| Original language | English |
|---|---|
| Article number | 6 |
| Journal | Journal of Open Research Software |
| Volume | 7 |
| Issue number | 1 |
| DOIs | |
| State | Published - 2019 |
| Externally published | Yes |
Funding
The management and financial expertise of Ibimina Nweke and Kali Wood provided crucial support during gcamdata development. The package makes crucial use of land-use and land cover change data developed by Alan Di Vittorio of Lawrence Berkeley National Laboratory. Primary support for this work was provided by the U.S. Department of Energy, Office of Science, as part of research in Multi-Sector Dynamics, Earth and Environmental System Modeling Program. Additional support was provided by the U.S. Department of Energy Offices of Fossil Energy, Nuclear Energy, and Energy Efficiency and Renewable Energy and the U.S. Environmental Protection Agency.
Keywords
- Data provenance
- Earth modeling
- Human-earth system modeling
- Reproducibility
- Unit testing