TY - GEN
T1 - D2U
T2 - 14th Workshop on Cyber Security Experimentation and Test, CSET 2021
AU - Oesch, Sean
AU - Bridges, Robert A.
AU - Verma, Miki
AU - Weber, Brian
AU - Diallo, Oumar
N1 - Publisher Copyright:
© 2021 ACM.
PY - 2021/8/9
Y1 - 2021/8/9
N2 - Whether testing intrusion detection systems, conducting training exercises, or creating data sets to be used by the broader cybersecurity community, realistic user behavior is a critical component of a cyber range. Existing methods either rely on network level data or replay recorded user actions to approximate real users in a network. Our work produces generative models trained on actual user data (sequences of application usage) collected from endpoints. Once trained to the user's behavioral data, these models can generate novel sequences of actions from the same distribution as the training data. These sequences of actions are then fed to our custom software via configuration files, which replicate those behaviors on end devices. Notably, our models are platform agnostic and could generate behavior data for any emulation software package. In this paper we present our model generation process, software architecture, and an investigation of the fidelity of our models. Specifically, we consider two different representations of the behavioral sequences, on which three standard generative models for sequential data-Markov Chain, Hidden Markov Model, and Random Surfer-are employed. Additionally, we examine adding a latent variable to faithfully capture time-of-day trends. Best results are observed when sampling a unique next behavior (regardless of the specific sequential model used) and the duration to take the behavior, paired with the temporal latent variable. Our software is currently deployed in a cyber range to help evaluate the efficacy of defensive cyber technologies, and we suggest additional ways that the cyber community as a whole can benefit from more realistic user behavior emulation.
AB - Whether testing intrusion detection systems, conducting training exercises, or creating data sets to be used by the broader cybersecurity community, realistic user behavior is a critical component of a cyber range. Existing methods either rely on network level data or replay recorded user actions to approximate real users in a network. Our work produces generative models trained on actual user data (sequences of application usage) collected from endpoints. Once trained to the user's behavioral data, these models can generate novel sequences of actions from the same distribution as the training data. These sequences of actions are then fed to our custom software via configuration files, which replicate those behaviors on end devices. Notably, our models are platform agnostic and could generate behavior data for any emulation software package. In this paper we present our model generation process, software architecture, and an investigation of the fidelity of our models. Specifically, we consider two different representations of the behavioral sequences, on which three standard generative models for sequential data-Markov Chain, Hidden Markov Model, and Random Surfer-are employed. Additionally, we examine adding a latent variable to faithfully capture time-of-day trends. Best results are observed when sampling a unique next behavior (regardless of the specific sequential model used) and the duration to take the behavior, paired with the temporal latent variable. Our software is currently deployed in a cyber range to help evaluate the efficacy of defensive cyber technologies, and we suggest additional ways that the cyber community as a whole can benefit from more realistic user behavior emulation.
KW - data driven
KW - data sets
KW - experimental infrastructure
KW - user emulation
UR - http://www.scopus.com/inward/record.url?scp=85115255291&partnerID=8YFLogxK
U2 - 10.1145/3474718.3475718
DO - 10.1145/3474718.3475718
M3 - Conference contribution
AN - SCOPUS:85115255291
T3 - ACM International Conference Proceeding Series
SP - 17
EP - 26
BT - Proceedings of CSET 2021 - 14th Workshop on Cyber Security Experimentation and Test
PB - Association for Computing Machinery
Y2 - 9 August 2021
ER -