Shifting Left for Machine Learning: An Empirical Study of Security Weaknesses in Supervised Learning-based Projects

Farzana Ahamed Bhuiyan, Stacy Prowell, Hossain Shahriar, Fan Wu, Akond Rahman

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Context: Supervised learning-based projects (SLPs), i.e., software projects that use supervised learning algorithms, such as decision trees are useful for performing classification-related tasks. Yet, security weaknesses, such as the use of hard-coded passwords in SLPs, can make SLPs susceptible to security attacks. A characterization of security weaknesses in SLPs can help practitioners understand the security weaknesses that are frequent in SLPs and adopt adequate mitigation strategies. Objective: The goal of this paper is to help practitioners se-curely develop supervised learning-based projects by conducting an empirical study of security weaknesses in supervised learning-based projects. Methodology: We conduct an empirical study by quantifying the frequency of security weaknesses in 278 open source SLPs. Results: We identify 22 types of security weaknesses that occur in SLPs. We observe 'use of potentially dangerous function' to be the most frequently occurring security weakness in SLPs. Of the identified 3,964 security weaknesses, 23.79 % and 40.49 % respectively, appear for source code files used to train and test models. We also observe evidence of co-location, e.g., instances of command injection co-locates with instances of potentially dangerous function. Conclusion: Based on our findings, we advocate for a shift left approach for SLP development with security-focused code reviews, and application of security static analysis.

Original languageEnglish
Title of host publicationProceedings - 2022 IEEE 46th Annual Computers, Software, and Applications Conference, COMPSAC 2022
EditorsHong Va Leong, Sahra Sedigh Sarvestani, Yuuichi Teranishi, Alfredo Cuzzocrea, Hiroki Kashiwazaki, Dave Towey, Ji-Jiang Yang, Hossain Shahriar
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages798-808
Number of pages11
ISBN (Electronic)9781665488105
DOIs
StatePublished - 2022
Event46th IEEE Annual Computers, Software, and Applications Conference, COMPSAC 2022 - Virtual, Online, United States
Duration: Jun 27 2022Jul 1 2022

Publication series

NameProceedings - 2022 IEEE 46th Annual Computers, Software, and Applications Conference, COMPSAC 2022

Conference

Conference46th IEEE Annual Computers, Software, and Applications Conference, COMPSAC 2022
Country/TerritoryUnited States
CityVirtual, Online
Period06/27/2207/1/22

Funding

We thank the PASER group at Tennessee Tech University for their valuable feedback. The research was partially funded by the U.S. National Science Foundation (NSF) award # 2026869. This manuscript has been partially authored by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the US Department of Energy (DOE). The US government retains and the publisher, by accepting the article for publication, acknowledges that the US government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for US government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan. We thank the PASER group at Tennessee Tech University for their valuable feedback. The research was partially funded by the U.S. National Science Foundation (NSF) award # 2026869. This manuscript has been partially authored by UTBattelle, LLC, under contract DE-AC05-00OR22725 with the US Department of Energy (DOE). The US government retains and the publisher, by accepting the article for publication, acknowledges that the US government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for US government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan.

FundersFunder number
National Science FoundationDE-AC05-00OR22725, 2026869
U.S. Department of Energy
Tennessee Tech University

    Keywords

    • security weakness
    • supervised machine learning

    Fingerprint

    Dive into the research topics of 'Shifting Left for Machine Learning: An Empirical Study of Security Weaknesses in Supervised Learning-based Projects'. Together they form a unique fingerprint.

    Cite this