Abstract
We present ASDF, the Adaptable Seismic Data Format, a modern and practical data format for all branches of seismology and beyond. The growing volume of freely available data coupled with ever expanding computational power opens avenues to tackle larger and more complex problems. Current bottlenecks include inefficient resource usage and insufficient data organization. Properly scaling a problem requires the resolution of both these challenges, and existing data formats are no longer up to the task. ASDF stores any number of synthetic, processed or unaltered waveforms in a single file. A key improvement compared to existing formats is the inclusion of comprehensive meta information, such as event or station information, in the same file. Additionally, it is also usable for any non-waveform data, for example, cross-correlations, adjoint sources or receiver functions. Last but not least, full provenance information can be stored alongside each item of data, thereby enhancing reproducibility and accountability. Any data set in our proposed format is self-describing and can be readily exchanged with others, facilitating collaboration. The utilization of the HDF5 container format grants efficient and parallel I/O operations, integrated compression algorithms and check sums to guard against data corruption. To not reinvent the wheel and to build upon past developments, we use existing standards like QuakeML, StationXML, W3C PROV and HDF5 wherever feasible. Usability and tool support are crucial for any new format to gain acceptance. We developed mature C/Fortran and Python based APIs coupling ASDF to the widely used SPECFEM3D_GLOBE and ObsPy toolkits.
Original language | English |
---|---|
Pages (from-to) | 1003-1011 |
Number of pages | 9 |
Journal | Geophysical Journal International |
Volume | 207 |
Issue number | 2 |
DOIs | |
State | Published - Nov 1 2016 |
Funding
This research was partially supported by the EU-FP7 VERCE project (number 283543) and US NSF grant 1112906. We are grateful for the QUEST Initial Training Network (Marie Curie Actions, http://www.quest-itn.org) and the Computational Infrastructure for Geodynamics (CIG, https://geodynamics.org/) organization for holding a joint workshop that sparked the creation of the ASDF format. This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. The authors also recognize support from the NSERC G8 Research Councils Initiative on Multilateral Research Funding and the Discovery Grant No. 487237. Additionally, we thank Chad Trabant and Tim Ahern from the Incorporated Research Institutions for Seismology (IRIS) as well as Emiliano Russo, Peter Danecek and Rodolfo Puglia for fruitful discussions and useful tips. We also thank editor Andrea Morelli and two anonymous reviewers for their thoughtful comments which helped improve the manuscript. Finally, we gratefully acknowledge conversations with HDF5Director of Earth Science Ted Habermann and help from Mohamad Chaarawi via the HDF5 User's Forum.
Keywords
- Computational seismology
- Seismic tomography
- Time-series analysis
- Wave propagation