Abstract
Transformer-based models have demonstrated much success in various natural language processing tasks. However, they are often vulnerable to adversarial attacks, such as data poisoning, which can intentionally fool the model into generating incorrect results. In this article, we present a novel, compound variant of a data poisoning attack on a transformer-based model that maximizes the poisoning effect while minimizing the scope of poisoning. We do so by combining the established data poisoning technique (label flipping) with a novel adversarial artifact selection and insertion technique aimed at minimizing detectability and the scope of the poisoning footprint. We find that by using a combination of these two techniques, we achieve a state-of-the-art attack success rate of approximately 90% while poisoning only 0.5% of the original training set, thus minimizing the scope and detectability of the poisoning action. These findings have the potential to advance the development of better data poisoning detection methods.
Original language | English |
---|---|
Article number | 22 |
Journal | Journal of Data and Information Quality |
Volume | 16 |
Issue number | 4 |
DOIs | |
State | Published - Dec 11 2024 |
Funding
This work was supported by UT-Battelle LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan ( http://energy.gov/downloads/doe-public-access-plan ).
Keywords
- Datasets
- gaze detection
- neural networks
- text tagging