Detecting outliers in streaming time series data from ARM distributed sensors

Yuping Lu, Jitendra Kumar, Nathan Collier, Bhargavi Krishna, Michael A. Langston

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

8 Scopus citations

Abstract

The Atmospheric Radiation Measurement (ARM) Data Center at ORNL collects data from a number of permanent and mobile facilities around the globe. The data is then ingested to create high level scientific products. High frequency streaming measurements from sensors and radar instruments at ARM sites require high degree of accuracy to enable rigorous study of atmospheric processes. Outliers in collected data are common due to instrument failure or extreme weather events. Thus, it is critical to identify and flag them. We employed multiple univariate, multivariate and time series techniques for outlier detection methods and studied their effectiveness. First, we examined Pearson correlation coefficient which is used to measure the pairwise correlations between variables. Singular Spectrum Analysis (SSA) was applied to detect outliers by removing the anticipated annual and seasonal cycles from the signal to accentuate anomalies. K-means was applied for multivariate examination of data from collection of sensors to identify any deviation from expected and known patterns and identify abnormal observation. The Pearson correlation coefficient, SSA and K-means methods were later combined together in a framework to detect outliers through a range of checks. We applied the developed method to data from meteorological sensors at ARM Southern Great Plains site and validated against existing database of known data quality issues.

Original languageEnglish
Title of host publicationProceedings - 18th IEEE International Conference on Data Mining Workshops, ICDMW 2018
EditorsHanghang Tong, Zhenhui Li, Feida Zhu, Jeffrey Yu
PublisherIEEE Computer Society
Pages779-786
Number of pages8
ISBN (Electronic)9781538692882
DOIs
StatePublished - Jul 2 2018
Event18th IEEE International Conference on Data Mining Workshops, ICDMW 2018 - Singapore, Singapore
Duration: Nov 17 2018Nov 20 2018

Publication series

NameIEEE International Conference on Data Mining Workshops, ICDMW
Volume2018-November
ISSN (Print)2375-9232
ISSN (Electronic)2375-9259

Conference

Conference18th IEEE International Conference on Data Mining Workshops, ICDMW 2018
Country/TerritorySingapore
CitySingapore
Period11/17/1811/20/18

Funding

This research was supported by the Atmospheric Radiation Measurement (ARM) user facility, a U.S. Department of Energy (DOE) Office of Science user facility managed by the Office of Biological and EnvironmentalResearch. Oak Ridge National Laboratory (ORNL) is managed by UT-Battelle,LLC for the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.This manuscript has been authored by UT- Battelle, LLC under Contract No. DE-AC05-00OR22725with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/ doe-public-access-plan).

FundersFunder number
UT-Battelle
U.S. Department of Energy
Office of Science
Oak Ridge National Laboratory

    Keywords

    • atmospheric science
    • clustering
    • outlier detection
    • time series

    Fingerprint

    Dive into the research topics of 'Detecting outliers in streaming time series data from ARM distributed sensors'. Together they form a unique fingerprint.

    Cite this