D2U: Data Driven User Emulation for the Enhancement of Cyber Testing, Training, and Data Set Generation

Sean Oesch, Robert A. Bridges, Miki Verma, Brian Weber, Oumar Diallo

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

Whether testing intrusion detection systems, conducting training exercises, or creating data sets to be used by the broader cybersecurity community, realistic user behavior is a critical component of a cyber range. Existing methods either rely on network level data or replay recorded user actions to approximate real users in a network. Our work produces generative models trained on actual user data (sequences of application usage) collected from endpoints. Once trained to the user's behavioral data, these models can generate novel sequences of actions from the same distribution as the training data. These sequences of actions are then fed to our custom software via configuration files, which replicate those behaviors on end devices. Notably, our models are platform agnostic and could generate behavior data for any emulation software package. In this paper we present our model generation process, software architecture, and an investigation of the fidelity of our models. Specifically, we consider two different representations of the behavioral sequences, on which three standard generative models for sequential data-Markov Chain, Hidden Markov Model, and Random Surfer-are employed. Additionally, we examine adding a latent variable to faithfully capture time-of-day trends. Best results are observed when sampling a unique next behavior (regardless of the specific sequential model used) and the duration to take the behavior, paired with the temporal latent variable. Our software is currently deployed in a cyber range to help evaluate the efficacy of defensive cyber technologies, and we suggest additional ways that the cyber community as a whole can benefit from more realistic user behavior emulation.

Original languageEnglish
Title of host publicationProceedings of CSET 2021 - 14th Workshop on Cyber Security Experimentation and Test
PublisherAssociation for Computing Machinery
Pages17-26
Number of pages10
ISBN (Electronic)9781450390651
DOIs
StatePublished - Aug 9 2021
Event14th Workshop on Cyber Security Experimentation and Test, CSET 2021 - Virtual, Online, United States
Duration: Aug 9 2021 → …

Publication series

NameACM International Conference Proceeding Series

Conference

Conference14th Workshop on Cyber Security Experimentation and Test, CSET 2021
Country/TerritoryUnited States
CityVirtual, Online
Period08/9/21 → …

Funding

The research is based upon work supported by the Department of Defense (DOD), Naval Information Warfare Systems Command (NAVWAR), via the Department of Energy (DOE) under contract DE-AC05-00OR22725. The views and conclusions contained herein are those of the authors and should not be interpreted as representing the official policies or endorsements, either expressed or implied, of the DOD, NAVWAR, or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon. This technology is currently under provisional patent 202004661.US.00 as Data Driven User Emulator.

FundersFunder number
U.S. Department of Defense
U.S. Department of EnergyDE-AC05-00OR22725
Naval Information Warfare Systems Command

    Keywords

    • data driven
    • data sets
    • experimental infrastructure
    • user emulation

    Fingerprint

    Dive into the research topics of 'D2U: Data Driven User Emulation for the Enhancement of Cyber Testing, Training, and Data Set Generation'. Together they form a unique fingerprint.

    Cite this