Abstract
Whether testing intrusion detection systems, conducting training exercises, or creating data sets to be used by the broader cybersecurity community, realistic user behavior is a critical component of a cyber range. Existing methods either rely on network level data or replay recorded user actions to approximate real users in a network. Our work produces generative models trained on actual user data (sequences of application usage) collected from endpoints. Once trained to the user's behavioral data, these models can generate novel sequences of actions from the same distribution as the training data. These sequences of actions are then fed to our custom software via configuration files, which replicate those behaviors on end devices. Notably, our models are platform agnostic and could generate behavior data for any emulation software package. In this paper we present our model generation process, software architecture, and an investigation of the fidelity of our models. Specifically, we consider two different representations of the behavioral sequences, on which three standard generative models for sequential data-Markov Chain, Hidden Markov Model, and Random Surfer-are employed. Additionally, we examine adding a latent variable to faithfully capture time-of-day trends. Best results are observed when sampling a unique next behavior (regardless of the specific sequential model used) and the duration to take the behavior, paired with the temporal latent variable. Our software is currently deployed in a cyber range to help evaluate the efficacy of defensive cyber technologies, and we suggest additional ways that the cyber community as a whole can benefit from more realistic user behavior emulation.
Original language | English |
---|---|
Title of host publication | Proceedings of CSET 2021 - 14th Workshop on Cyber Security Experimentation and Test |
Publisher | Association for Computing Machinery |
Pages | 17-26 |
Number of pages | 10 |
ISBN (Electronic) | 9781450390651 |
DOIs | |
State | Published - Aug 9 2021 |
Event | 14th Workshop on Cyber Security Experimentation and Test, CSET 2021 - Virtual, Online, United States Duration: Aug 9 2021 → … |
Publication series
Name | ACM International Conference Proceeding Series |
---|
Conference
Conference | 14th Workshop on Cyber Security Experimentation and Test, CSET 2021 |
---|---|
Country/Territory | United States |
City | Virtual, Online |
Period | 08/9/21 → … |
Funding
The research is based upon work supported by the Department of Defense (DOD), Naval Information Warfare Systems Command (NAVWAR), via the Department of Energy (DOE) under contract DE-AC05-00OR22725. The views and conclusions contained herein are those of the authors and should not be interpreted as representing the official policies or endorsements, either expressed or implied, of the DOD, NAVWAR, or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon. This technology is currently under provisional patent 202004661.US.00 as Data Driven User Emulator.
Keywords
- data driven
- data sets
- experimental infrastructure
- user emulation