Abstract
Host logs, in particular, Windows Event Logs, are a valuable source of information often collected by security operation centers (SOCs). The semi-structured nature of host logs inhibits automated analytics, and while manual analysis is common, the sheer volume makes manual inspection of all logs impossible. Although many powerful algorithms for analyzing time-series and sequential data exist, utilization of such algorithms for most cyber security applications is either infeasible or requires tailored, research-intensive preparations. In particular, basic mathematic and algorithmic developments for providing a generalized, meaningful similarity metric on system logs is needed to bridge the gap between many existing sequential data mining methods and this currently available but under-utilized data source. In this paper, we provide a rigorous definition of a metric product space on Windows Event Logs, providing an embedding that allows for the application of established machine learning and time-series analysis methods. We then demonstrate the utility and flexibility of this embedding with multiple use-cases on real data: (1) comparing known infected to new host log streams for attack detection and forensics, (2) collapsing similar streams of logs into semantically-meaningful groups (by user, by role), thereby reducing the quantity of data but not the content, (3) clustering logs as well as short sequences of logs to identify and visualize user behaviors and background processes over time. Overall, we provide a metric space framework for general host logs and log sequences that respects semantic similarity and facilitates a wide variety of data science analytics to these logs without data-specific preparations for each.
Original language | English |
---|---|
Title of host publication | Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018 |
Editors | Naoki Abe, Huan Liu, Calton Pu, Xiaohua Hu, Nesreen Ahmed, Mu Qiao, Yang Song, Donald Kossmann, Bing Liu, Kisung Lee, Jiliang Tang, Jingrui He, Jeffrey Saltz |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 5068-5077 |
Number of pages | 10 |
ISBN (Electronic) | 9781538650356 |
DOIs | |
State | Published - Jul 2 2018 |
Event | 2018 IEEE International Conference on Big Data, Big Data 2018 - Seattle, United States Duration: Dec 10 2018 → Dec 13 2018 |
Publication series
Name | Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018 |
---|
Conference
Conference | 2018 IEEE International Conference on Big Data, Big Data 2018 |
---|---|
Country/Territory | United States |
City | Seattle |
Period | 12/10/18 → 12/13/18 |
Funding
This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan).