The future low-temperature geochemical data-scape as envisioned by the U.S. geochemical community

Susan L. Brantley, Tao Wen, Deborah A. Agarwal, Jeffrey G. Catalano, Paul A. Schroeder, Kerstin Lehnert, Charuleka Varadharajan, Julie Pett-Ridge, Mark Engle, Anthony M. Castronova, Richard P. Hooper, Xiaogang Ma, Lixin Jin, Kenton McHenry, Emma Aronson, Andrew R. Shaughnessy, Louis A. Derry, Justin Richardson, Jerad Bales, Eric M. Pierce

Research output: Contribution to journalArticlepeer-review

4 Scopus citations

Abstract

Data sharing benefits the researcher, the scientific community, and the public by allowing the impact of data to be generalized beyond one project and by making science more transparent. However, many scientific communities have not developed protocols or standards for publishing, citing, and versioning datasets. One community that lags in data management is that of low-temperature geochemistry (LTG). This paper resulted from an initiative from 2018 through 2020 to convene LTG and data scientists in the U.S. to strategize future management of LTG data. Through webinars, a workshop, a preprint, a townhall, and a community survey, the group of U.S. scientists discussed the landscape of data management for LTG – the data-scape. Currently this data-scape includes a “street bazaar” of data repositories. This was deemed appropriate in the same way that LTG scientists publish articles in many journals. The variety of data repositories and journals reflect that LTG scientists target many different scientific questions, produce data with extremely different structures and volumes, and utilize copious and complex metadata. Nonetheless, the group agreed that publication of LTG science must be accompanied by sharing of data in publicly accessible repositories, and, for sample-based data, registration of samples with globally unique persistent identifiers. LTG scientists should use certified data repositories that are either highly structured databases designed for specialized types of data, or unstructured generalized data systems. Recognizing the need for tools to enable search and cross-referencing across the proliferating data repositories, the group proposed that the overall data informatics paradigm in LTG should shift from “build data repository, data will come” to “publish data online, cybertools will find”. Funding agencies could also provide portals for LTG scientists to register funded projects and datasets, and forge approaches that cross national boundaries. The needed transformation of the LTG data culture requires emphasis in student education on science and management of data.

Original languageEnglish
Article number104933
JournalComputers and Geosciences
Volume157
DOIs
StatePublished - Dec 2021

Funding

This paper was initiated during a workshop held February 18–20 (2020) that was funded by the U.S. National Science Foundation (EAR 19–39257 to S. L.. Brantley). The workshop, where twenty-four institutions were represented, was organized by S. L. Brantley, T. Wen, D. Agarwal, and J. G. Catalano. All participants engaged in the ideas in the paper but some, not listed as coauthors, did not work on this document. Those latter participants included E. Barrera, P. Bennett, O. Harvey, R. Hazen, N. Kabengi, M. Leon, S. Morrison, C. Reinhard, J. Tang, and K. Williams. Some authors could not participate in the workshop. Four webinars were run prior to the workshop: D. Agarwal (11/25/2019); X. Ma (12/16/2019); K. Lehnert (1/17/2020); J. Bales (1/31/2020). An earlier version of this manuscript was posted at EarthArXiv ( https://eartharxiv.org/repository/view/1839/ ), and was sent out to 350 LTG scientists funded in the U.S. with an online survey. Twenty-seven scientists responded to the survey and 24 scientists participated in an online townhall discussion. Support for P. Schroeder is acknowledged from EAR-GEO-1331846. D.Agarwal and C.Varadharajan were funded as part of the ESS-DIVE project by the U.S. Department of Energy's Office of Science under Contract No. DE-AC02-05CH11231. Helpful reviews from two anonymous reviewers and Francis Albarede and associate editor Pierre Lanari are acknowledged. These databases and other long-term repositories ( Table 1 ) share some attributes. First, they target only a subset of data as defined by their mission or funding: PetDB, for example, was funded by NSF's RIDGE Program to collate the geochemistry of igneous and metamorphic rocks of the ocean floor. These databases do not include the geochemistry of all rock types even though they have accepted similar geochemical data for other materials. Second, successful databases tend to receive consistent funding over many years from government agencies, private foundations, libraries, or universities, or are led by a small group of dedicated scientists (<12) who attract data from other contributing scientists. This paper was initiated during a workshop held February 18–20 (2020) that was funded by the U.S. National Science Foundation (EAR 19–39257 to S. L. Brantley). The workshop, where twenty-four institutions were represented, was organized by S. L. Brantley, T. Wen, D. Agarwal, and J. G. Catalano. All participants engaged in the ideas in the paper but some, not listed as coauthors, did not work on this document. Those latter participants included E. Barrera, P. Bennett, O. Harvey, R. Hazen, N. Kabengi, M. Leon, S. Morrison, C. Reinhard, J. Tang, and K. Williams. Some authors could not participate in the workshop. Four webinars were run prior to the workshop: D. Agarwal (11/25/2019); X. Ma (12/16/2019); K. Lehnert (1/17/2020); J. Bales (1/31/2020). An earlier version of this manuscript was posted at EarthArXiv (https://eartharxiv.org/repository/view/1839/), and was sent out to 350 LTG scientists funded in the U.S. with an online survey. Twenty-seven scientists responded to the survey and 24 scientists participated in an online townhall discussion. Support for P. Schroeder is acknowledged from EAR-GEO-1331846. D.Agarwal and C.Varadharajan were funded as part of the ESS-DIVE project by the U.S. Department of Energy's Office of Science under Contract No. DE-AC02-05CH11231. Helpful reviews from two anonymous reviewers and Francis Albarede and associate editor Pierre Lanari are acknowledged. Another idea that emerged during this initiative and that would enable data discovery was that funders of LTG science could build portals to register their LTG projects, similar to the BCO-DMO portal built for oceanographic and polar projects funded by the NSF ( National Science Foundation Biological and Chemical Oceanography Data Management Office, 2020 ). Another somewhat similar example is the U.S. National Energy Technology Laboratory Energy Data eXchange ( N.E.T.L., 2020 ). All projects funded through a given program would be required to register within the site and each project would be required to either upload project data to the portal site itself, or provide a link to project data in another online data management system. The portal could thus provide data management and navigation services at no cost to the program-funded projects and would promote discovery of data funded by the agency.

Keywords

  • Data management
  • Data repositories
  • Data sharing
  • Geochemistry
  • Metadata
  • Open science

Fingerprint

Dive into the research topics of 'The future low-temperature geochemical data-scape as envisioned by the U.S. geochemical community'. Together they form a unique fingerprint.

Cite this