Abstract
Authorship attribution, an NLP problem where anonymous text is matched to its author, has important, cross-disciplinary applications, particularly those concerning cyber-defense. Our research examines the degree of sensitivity that attention-based models have to adversarial perturbations. We ask, what is the minimal amount of change necessary to maximally confuse a transformer model? In our investigation we examine a balanced subset of emails from the Enron email dataset, calculating the performance of our model before and after email signatures have been perturbed. Results show that the model's performance changed significantly in the absence of a signature, indicating the importance of email signatures in email authorship detection. Furthermore, we show that these models rely on signatures for shorter emails much more than for longer emails. We also indicate that additional research is necessary to investigate stylometric features and adversarial training to further improve classification model robustness.
Original language | English |
---|---|
Title of host publication | Proceedings - 2021 IEEE Symposium on Security and Privacy Workshops, SPW 2021 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 291-297 |
Number of pages | 7 |
ISBN (Electronic) | 9781728189345 |
DOIs | |
State | Published - May 2021 |
Event | 2021 IEEE Symposium on Security and Privacy Workshops, SPW 2021 - Virtual, Online Duration: May 27 2021 → … |
Publication series
Name | Proceedings - 2021 IEEE Symposium on Security and Privacy Workshops, SPW 2021 |
---|
Conference
Conference | 2021 IEEE Symposium on Security and Privacy Workshops, SPW 2021 |
---|---|
City | Virtual, Online |
Period | 05/27/21 → … |
Bibliographical note
Publisher Copyright:© 2021 IEEE.
Keywords
- adversarial perturbation
- attention-based models
- authorship attribution
- digital forensics
- natural language processing
- transformer-based networks