Setting the threshold for high throughput detectors: A mathematical approach for ensembles of dynamic, heterogeneous, probabilistic anomaly detectors

Robert A. Bridges, Jessie D. Jamieson, Joel W. Reed

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

9 Scopus citations

Abstract

Cyber operations now manage a high volume of heterogeneous log data. Anomaly Detection (AD) in such operations involves multiple (e.g., per IP, per data type) ensembles of detectors modeling heterogeneous characteristics (e.g., rate, size, type) often with adaptive online models producing alerts in near real time. Because of the high data volume, setting the threshold for each detector in such a system is an essential yet underdeveloped configuration issue that, if slightly mistuned, can leave the system useless, either producing a myriad of alerts (and flooding downstream systems) or giving none. In this work, we build on the foundations of Ferragut et al. to provide a set of rigorous results for understanding the relationship between threshold values and alert quantities for probabilistic detectors. This informs an algorithm for setting the threshold of multiple, heterogeneous, possibly dynamic detectors completely a priori, in principle. Indeed, if the underlying distribution of the incoming data is known, the algorithm provides provably manageable thresholds. If the distribution is unknown (poorly estimated), our analysis gives insight into how the model distribution differs from the actual distribution, indicating refitting is necessary. We provide empirical experiments, regulating the alert rate of a system with ≈2,500 adaptive detectors scoring over 1.5M events in 5 hours of timestamps. Further, we demonstrate on real network data and detection framework of Harshaw et al. the alternative case, demonstrating that the inability to regulate alerts indicates how the detection model is not a good fit to the data.

Original languageEnglish
Title of host publicationProceedings - 2017 IEEE International Conference on Big Data, Big Data 2017
EditorsJian-Yun Nie, Zoran Obradovic, Toyotaro Suzumura, Rumi Ghosh, Raghunath Nambiar, Chonggang Wang, Hui Zang, Ricardo Baeza-Yates, Ricardo Baeza-Yates, Xiaohua Hu, Jeremy Kepner, Alfredo Cuzzocrea, Jian Tang, Masashi Toyoda
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1071-1078
Number of pages8
ISBN (Electronic)9781538627143
DOIs
StatePublished - Jul 1 2017
Event5th IEEE International Conference on Big Data, Big Data 2017 - Boston, United States
Duration: Dec 11 2017Dec 14 2017

Publication series

NameProceedings - 2017 IEEE International Conference on Big Data, Big Data 2017
Volume2018-January

Conference

Conference5th IEEE International Conference on Big Data, Big Data 2017
Country/TerritoryUnited States
CityBoston
Period12/11/1712/14/17

Funding

Thank you J. Laska, V. Protopopescu, M. McClelland, L. Nichols, J. Gerber, S. Kiel, and reviewers whose comments helped polish this document. This material is based on research sponsored by the U.S. Department of Homeland Security (DHS) under Grant Award Number 2009-ST- 061-CI0001, DHS VACCINE Center under Award 2009- ST-061-CI0003, and Laboratory Directed Research and Development Program of Oak Ridge National Laboratory, managed by UT-Battelle, LLC, for the U. S. Department of Energy, contract DE-AC05-00OR22725. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the DHS. This material is based upon work supported by the National Science Foundation Graduate Research Fellowship Program under Grant No. 25-0517- 0143-002. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation. The data used in this research and referenced in this paper was created by Skaion Corporation with funding from the Intelligence Advanced Research Project Agency, via www.impactcybertrust.org. This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan).

Fingerprint

Dive into the research topics of 'Setting the threshold for high throughput detectors: A mathematical approach for ensembles of dynamic, heterogeneous, probabilistic anomaly detectors'. Together they form a unique fingerprint.

Cite this