Abstract
Due to the growing amount of data from in-situ sensors in environmental monitoring, it becomes necessary to automatically detect anomalous data points. Nowadays, this is mainly performed using supervised machine learning models, which need a fully labelled data set for their training process. However, the process of labelling data is typically cumbersome and, as a result, a hindrance to the adoption of machine learning methods for automated anomaly detection. In this work, we propose to address this challenge by means of active learning. This method consists of querying the domain expert for the labels of only a selected subset of the full data set. We show that this reduces the time and costs associated to labelling while delivering the same or similar anomaly detection performances. Finally, we also show that machine learning models providing a nonlinear classification boundary are to be recommended for anomaly detection in complex environmental data sets.
Original language | English |
---|---|
Article number | 104869 |
Journal | Environmental Modelling and Software |
Volume | 134 |
DOIs | |
State | Published - Dec 2020 |
Funding
The authors would like to thank Anita Narwani and Piet Spaak for their contributions to the work presented in this paper. The study has been made possible by the Eawag Discretionary Funds (grant number: 5221.00492.012.02 , project: DF2018/ADASen). This manuscript has been authored in part by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the US Department of Energy (DOE). The US government retains and the publisher, by accepting the article for publication, acknowledges that the US government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for US government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan ( http://energy.gov/downloads/doepublic-access-plan ).
Funders | Funder number |
---|---|
Eawag Discretionary Funds | 5221.00492.012.02 |
U.S. Department of Energy |
Keywords
- Active learning
- Anomaly detection
- Environmental monitoring
- Machine learning