Tracking clusters and anomalies in evolving data streams

Sreelekha Guggilam, Varun Chandola, Abani Patra

Research output: Contribution to journalArticlepeer-review

4 Scopus citations

Abstract

Data-driven anomaly detection methods typically build a model for the normal behavior of the target system, and score each data instance with respect to this model. A threshold is invariably needed to identify data instances with high (or low) scores as anomalies. This presents a practical limitation on the applicability of such methods, since most methods are sensitive to the choice of the threshold, and it is challenging to set optimal thresholds. The issue is exacerbated in a streaming scenario, where the optimal thresholds vary with time. We present a probabilistic framework to explicitly model the normal and anomalous behaviors and probabilistically reason about the data. An extreme value theory based formulation is proposed to model the anomalous behavior as the extremes of the normal behavior. As a specific instantiation, a joint nonparametric clustering and anomaly detection algorithm (INCAD) is proposed that models the normal behavior as a Dirichlet process mixture model. Results on a variety of datasets, including streaming data, show that the proposed method provides effective and simultaneous clustering and anomaly detection without requiring strong initialization and threshold parameters.

Original languageEnglish
Pages (from-to)156-178
Number of pages23
JournalStatistical Analysis and Data Mining
Volume15
Issue number2
DOIs
StatePublished - Apr 2022
Externally publishedYes

Funding

information National Science Foundation, NSF/DMS 1621853; NSF/OAC 1339765The authors would like to acknowledge University at Buffalo Center for Computational Research (http://www.buffalo.edu/ccr.html) for its computing resources that were made available for conducting the research reported in this paper. Financial support of the National Science Foundation Grant numbers NSF/OAC 1339765 and NSF/DMS 1621853 is acknowledged. The authors would like to acknowledge University at Buffalo Center for Computational Research ( http://www.buffalo.edu/ccr.html ) for its computing resources that were made available for conducting the research reported in this paper. Financial support of the National Science Foundation Grant numbers NSF/OAC 1339765 and NSF/DMS 1621853 is acknowledged.

Keywords

  • Bayesian nonparametric models
  • anomaly detection
  • clustering-based anomaly detection
  • evolving stream data
  • extreme value theory

Fingerprint

Dive into the research topics of 'Tracking clusters and anomalies in evolving data streams'. Together they form a unique fingerprint.

Cite this