The value of human data annotation for machine learning based anomaly detection in environmental systems

Stefania Russo, Michael D. Besmer, Frank Blumensaat, Damien Bouffard, Andy Disch, Frederik Hammes, Angelika Hess, Moritz Lürig, Blake Matthews, Camille Minaudo, Eberhard Morgenroth, Viet Tran-Khac, Kris Villez

Research output: Contribution to journalArticlepeer-review

20 Scopus citations

Abstract

Anomaly detection is the process of identifying unexpected data samples in datasets. Automated anomaly detection is either performed using supervised machine learning models, which require a labelled dataset for their calibration, or unsupervised models, which do not require labels. While academic research has produced a vast array of tools and machine learning models for automated anomaly detection, the research community focused on environmental systems still lacks a comparative analysis that is simultaneously comprehensive, objective, and systematic. This knowledge gap is addressed for the first time in this study, where 15 different supervised and unsupervised anomaly detection models are evaluated on 5 different environmental datasets from engineered and natural aquatic systems. To this end, anomaly detection performance, labelling efforts, as well as the impact of model and algorithm tuning are taken into account. As a result, our analysis reveals the relative strengths and weaknesses of the different approaches in an objective manner without bias for any particular paradigm in machine learning. Most importantly, our results show that expert-based data annotation is extremely valuable for anomaly detection based on machine learning.

Original languageEnglish
Article number117695
JournalWater Research
Volume206
DOIs
StatePublished - Nov 1 2021

Funding

The authors would like to thank Ccile Bettex, Juan Pablo Carbajal, Anita Narwani and Piet Spaak for their contributions to the work presented in this paper and Ben LaRiviere for his useful feedback. The study has been made possible by the Eawag Discretionary Funds (grant number: 5221.00492.012.02, project: DF2018/ADASen). This research is sponsored by the US Department of Energy (DOE), Office of Energy Efficiency and Renewable Energy, Advanced Manufacturing Office, under contract DE-AC05-00OR22725 with UT-Battelle LLC. This manuscript has been authored by UT-Battelle LLC under contract DE-AC05-00OR22725 with DOE. The US government retain and the publisher, by accepting the article for publication, acknowledges that the US government retain a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript or allow others to do so for US government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan ( http://www.energy.gov/downloads/doe-public-access-plan ).

FundersFunder number
Eawag Discretionary Funds5221.00492.012.02
U.S. Department of Energy
Advanced Manufacturing OfficeDE-AC05-00OR22725
Office of Energy Efficiency and Renewable Energy
UT-Battelle

    Keywords

    • Anomaly detection
    • Environmental systems
    • Labels
    • Machine learning

    Fingerprint

    Dive into the research topics of 'The value of human data annotation for machine learning based anomaly detection in environmental systems'. Together they form a unique fingerprint.

    Cite this