TY - GEN
T1 - How good is good enough? Quantifying the effects of training set quality
AU - Swan, Benjamin
AU - Laverdiere, Melanie
AU - Yang, H. Lexie
N1 - Publisher Copyright:
© 2018 Association for Computing Machinery.
PY - 2018/11/6
Y1 - 2018/11/6
N2 - There is a general consensus in the neural network community that noise in training data has a negative impact on model output; however, efforts to quantify the impact of varying levels have been limited, particularly for semantic segmentation tasks. This is a question of particular importance for remote sensing applications where the cost of producing a large training set can lead to reliance on publicly available data with varying degrees of noise. This work explores the effects of different degrees and types of training label noise on a pre-trained building extraction deep learner. Quantitative and qualitative evaluations of these effects can help inform decisions about trade-offs between the cost of producing training data and the quality of model outputs. We found that, relative to the base model, models trained with small amounts of noise showed little change in precision but achieved considerable increases in recall. Conversely, as noise levels increased, both precision and recall decreased. Precision and recall both lagged behind a model trained with pristine data. These exploratory results indicate the importance of quality control for training and, more broadly, that the relationship between degrees and types of training data noise and model performance is more complex than trade-offs between precision and recall.
AB - There is a general consensus in the neural network community that noise in training data has a negative impact on model output; however, efforts to quantify the impact of varying levels have been limited, particularly for semantic segmentation tasks. This is a question of particular importance for remote sensing applications where the cost of producing a large training set can lead to reliance on publicly available data with varying degrees of noise. This work explores the effects of different degrees and types of training label noise on a pre-trained building extraction deep learner. Quantitative and qualitative evaluations of these effects can help inform decisions about trade-offs between the cost of producing training data and the quality of model outputs. We found that, relative to the base model, models trained with small amounts of noise showed little change in precision but achieved considerable increases in recall. Conversely, as noise levels increased, both precision and recall decreased. Precision and recall both lagged behind a model trained with pristine data. These exploratory results indicate the importance of quality control for training and, more broadly, that the relationship between degrees and types of training data noise and model performance is more complex than trade-offs between precision and recall.
KW - Building detection
KW - Convolutional neural networks
KW - Remote sensing
KW - Training data
UR - http://www.scopus.com/inward/record.url?scp=85059011542&partnerID=8YFLogxK
U2 - 10.1145/3281548.3281557
DO - 10.1145/3281548.3281557
M3 - Conference contribution
AN - SCOPUS:85059011542
T3 - Proceedings of the 2nd ACM SIGSPATIAL International Workshop on AI for Geographic Knowledge Discovery, GeoAI 2018
SP - 5
EP - 8
BT - Proceedings of the 2nd ACM SIGSPATIAL International Workshop on AI for Geographic Knowledge Discovery, GeoAI 2018
A2 - Gao, Song
A2 - Newsam, Shawn
A2 - Hu, Yingjie
A2 - Lunga, Dalton
PB - Association for Computing Machinery, Inc
T2 - 2nd ACM SIGSPATIAL International Workshop on AI for Geographic Knowledge Discovery, GeoAI 2018
Y2 - 6 November 2018 through 6 November 2018
ER -