TY - JOUR
T1 - Do we have globally representative data to understand soil processes?
AU - Malhotra, Avni
AU - von Fromm, Sophie F.
AU - Bond-Lamberty, Ben
AU - Doetterl, Sebastian
AU - Georgiou, Katerina
AU - Graham, Emily B.
AU - Heckman, Katherine A.
AU - Jia, Ruofei
AU - Patel, Kaizad F.
AU - Rod, Kenton A.
AU - Santos, Fernanda
AU - Terrer, César
AU - Todd-Brown, Katherine
AU - Zheng, Jianqiu
AU - Hofmockel, Kirsten
AU - Bailey, Vanessa
N1 - Publisher Copyright:
© Battelle Memorial Institute, UT-Battelle, LLC and the Authors. Parts of this work were authored by US Federal Government authors and are not under copyright protection in the US; foreign copyright protection may apply. 2026.
PY - 2026/4
Y1 - 2026/4
N2 - Understanding and modeling soils and soil organic matter (SOM) are central to a variety of human needs, from food production to ecosystem management. Soil data have been collected for over a century, but the global spatial and process representativeness of soil data remains unclear. We assessed the representativeness of currently available soil data that could be used to understand a variety of SOM processes. We used 16 open-source soil databases and data from over 281,000 unique locations globally, categorizing the databases into three main data types necessary to understand SOM processes: soil carbon stocks and fluxes, mechanistic drivers of these stocks and fluxes, and soil carbon gain or loss potential. We found that stock and driver data have extensive global coverage. However, data on soil carbon gain or loss potential, particularly data describing change in soils over time such as time series data, are severely limited in their global coverage. We conclude that while significant strides have been made in measuring soil carbon stocks and fluxes, and their drivers, we are limited in global data related to changes in soils over time. Our recommendations for soil data generators are to ensure precise metadata reporting and prioritizing sampling in underrepresented areas like tropical, arctic, mountainous, wetland and arid regions. We also encourage designing revisit schemes that explicitly support change detection and reporting multi-modal datasets that can aid in model development. Targeted measurement of low coverage soil data types and regions is necessary for a range of applications including current and future biogeochemical predictions, and their management and policy implications.
AB - Understanding and modeling soils and soil organic matter (SOM) are central to a variety of human needs, from food production to ecosystem management. Soil data have been collected for over a century, but the global spatial and process representativeness of soil data remains unclear. We assessed the representativeness of currently available soil data that could be used to understand a variety of SOM processes. We used 16 open-source soil databases and data from over 281,000 unique locations globally, categorizing the databases into three main data types necessary to understand SOM processes: soil carbon stocks and fluxes, mechanistic drivers of these stocks and fluxes, and soil carbon gain or loss potential. We found that stock and driver data have extensive global coverage. However, data on soil carbon gain or loss potential, particularly data describing change in soils over time such as time series data, are severely limited in their global coverage. We conclude that while significant strides have been made in measuring soil carbon stocks and fluxes, and their drivers, we are limited in global data related to changes in soils over time. Our recommendations for soil data generators are to ensure precise metadata reporting and prioritizing sampling in underrepresented areas like tropical, arctic, mountainous, wetland and arid regions. We also encourage designing revisit schemes that explicitly support change detection and reporting multi-modal datasets that can aid in model development. Targeted measurement of low coverage soil data types and regions is necessary for a range of applications including current and future biogeochemical predictions, and their management and policy implications.
KW - Carbon fluxes
KW - Carbon stocks
KW - Representativeness analysis
KW - Soil carbon
KW - Soil databases
KW - Time series data
UR - https://www.scopus.com/pages/publications/105035908371
U2 - 10.1007/s10533-025-01301-z
DO - 10.1007/s10533-025-01301-z
M3 - Article
AN - SCOPUS:105035908371
SN - 0168-2563
VL - 169
JO - Biogeochemistry
JF - Biogeochemistry
IS - 2
M1 - 23
ER -