IMPUTATION OF CONTIGUOUS GAPS AND EXTREMES OF SUBHOURLY GROUNDWATER TIME SERIES USING RANDOM FORESTS

  • Dipankar Dwivedi
  • , Utkarsh Mital
  • , Boris Faybishenko
  • , Baptiste Dafflon
  • , Charuleka Varadharajan
  • , Deborah Agarwal
  • , Kenneth H. Williams
  • , Carl I. Steefel
  • , Susan S. Hubbard

Research output: Contribution to journalArticlepeer-review

23 Scopus citations

Abstract

Machine learning can provide sustainable solutions to gap-fill groundwater (GW) data needed to adequately constrain watershed models. However, imputing missing extremes is more challenging than other parts of a hydrograph. To impute missing subhourly data, including extremes, within GW time-series data collected at multiple wells in the East River watershed, located in southwestern Colorado, we consider a single-well imputation (SWI) and a multiple-well imputation (MWI) approach. SWI gap-fills missing GW entries in a well using the same well’s time-series data; MWI gap-fills a specific well’s missing GW entry using the time series of neighboring wells. SWI takes advantage of linear interpolation and random forest (RF) approaches, whereas MWI exploits only the RF approach. We also use an information entropy framework to develop insights into how missing data patterns impact imputation. We discovered that if gaps were at random intervals, SWI could accurately impute up to 90% of missing data over an approximately two-year period. Contiguous gaps constituted more complex scenarios for imputation and required the use of MWI. Information entropy suggested that if gaps were contiguous, up to 50% of missing GW data could be estimated accurately over an approximately two-year period. The RF-feature importance suggested that a time feature (months) and a space feature (neighboring wells) were the most important predictors in the SWI and MWI. We also noted that neither SWI nor MWI methods could capture the missing extremes of a hydrograph. To counter this, we developed a new sequential approach and demonstrated the imputation of missing extremes in a GW time series with high accuracy.

Original languageEnglish
Pages (from-to)1-22
Number of pages22
JournalJournal of Machine Learning for Modeling and Computing
Volume3
Issue number2
DOIs
StatePublished - 2022
Externally publishedYes

Funding

This work was funded by the Watershed Function Scientific Focus Area and ExaSheds projects, supported by the U.S. Department of Energy, Office of Science, Office of Biological and Environmental Research, Earth and Environmental Systems Sciences Division, under Award No. DE-AC02-05CH11231.

Keywords

  • extremes
  • gap filling
  • groundwater
  • information entropy
  • modeling
  • sequential imputation

Fingerprint

Dive into the research topics of 'IMPUTATION OF CONTIGUOUS GAPS AND EXTREMES OF SUBHOURLY GROUNDWATER TIME SERIES USING RANDOM FORESTS'. Together they form a unique fingerprint.

Cite this