TY - GEN
T1 - Shifting Left for Machine Learning
T2 - 46th IEEE Annual Computers, Software, and Applications Conference, COMPSAC 2022
AU - Bhuiyan, Farzana Ahamed
AU - Prowell, Stacy
AU - Shahriar, Hossain
AU - Wu, Fan
AU - Rahman, Akond
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - Context: Supervised learning-based projects (SLPs), i.e., software projects that use supervised learning algorithms, such as decision trees are useful for performing classification-related tasks. Yet, security weaknesses, such as the use of hard-coded passwords in SLPs, can make SLPs susceptible to security attacks. A characterization of security weaknesses in SLPs can help practitioners understand the security weaknesses that are frequent in SLPs and adopt adequate mitigation strategies. Objective: The goal of this paper is to help practitioners se-curely develop supervised learning-based projects by conducting an empirical study of security weaknesses in supervised learning-based projects. Methodology: We conduct an empirical study by quantifying the frequency of security weaknesses in 278 open source SLPs. Results: We identify 22 types of security weaknesses that occur in SLPs. We observe 'use of potentially dangerous function' to be the most frequently occurring security weakness in SLPs. Of the identified 3,964 security weaknesses, 23.79 % and 40.49 % respectively, appear for source code files used to train and test models. We also observe evidence of co-location, e.g., instances of command injection co-locates with instances of potentially dangerous function. Conclusion: Based on our findings, we advocate for a shift left approach for SLP development with security-focused code reviews, and application of security static analysis.
AB - Context: Supervised learning-based projects (SLPs), i.e., software projects that use supervised learning algorithms, such as decision trees are useful for performing classification-related tasks. Yet, security weaknesses, such as the use of hard-coded passwords in SLPs, can make SLPs susceptible to security attacks. A characterization of security weaknesses in SLPs can help practitioners understand the security weaknesses that are frequent in SLPs and adopt adequate mitigation strategies. Objective: The goal of this paper is to help practitioners se-curely develop supervised learning-based projects by conducting an empirical study of security weaknesses in supervised learning-based projects. Methodology: We conduct an empirical study by quantifying the frequency of security weaknesses in 278 open source SLPs. Results: We identify 22 types of security weaknesses that occur in SLPs. We observe 'use of potentially dangerous function' to be the most frequently occurring security weakness in SLPs. Of the identified 3,964 security weaknesses, 23.79 % and 40.49 % respectively, appear for source code files used to train and test models. We also observe evidence of co-location, e.g., instances of command injection co-locates with instances of potentially dangerous function. Conclusion: Based on our findings, we advocate for a shift left approach for SLP development with security-focused code reviews, and application of security static analysis.
KW - security weakness
KW - supervised machine learning
UR - http://www.scopus.com/inward/record.url?scp=85136993161&partnerID=8YFLogxK
U2 - 10.1109/COMPSAC54236.2022.00130
DO - 10.1109/COMPSAC54236.2022.00130
M3 - Conference contribution
AN - SCOPUS:85136993161
T3 - Proceedings - 2022 IEEE 46th Annual Computers, Software, and Applications Conference, COMPSAC 2022
SP - 798
EP - 808
BT - Proceedings - 2022 IEEE 46th Annual Computers, Software, and Applications Conference, COMPSAC 2022
A2 - Va Leong, Hong
A2 - Sarvestani, Sahra Sedigh
A2 - Teranishi, Yuuichi
A2 - Cuzzocrea, Alfredo
A2 - Kashiwazaki, Hiroki
A2 - Towey, Dave
A2 - Yang, Ji-Jiang
A2 - Shahriar, Hossain
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 27 June 2022 through 1 July 2022
ER -