How good is good enough? Quantifying the effects of training set quality

Benjamin Swan, Melanie Laverdiere, H. Lexie Yang

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Scopus citations

Abstract

There is a general consensus in the neural network community that noise in training data has a negative impact on model output; however, efforts to quantify the impact of varying levels have been limited, particularly for semantic segmentation tasks. This is a question of particular importance for remote sensing applications where the cost of producing a large training set can lead to reliance on publicly available data with varying degrees of noise. This work explores the effects of different degrees and types of training label noise on a pre-trained building extraction deep learner. Quantitative and qualitative evaluations of these effects can help inform decisions about trade-offs between the cost of producing training data and the quality of model outputs. We found that, relative to the base model, models trained with small amounts of noise showed little change in precision but achieved considerable increases in recall. Conversely, as noise levels increased, both precision and recall decreased. Precision and recall both lagged behind a model trained with pristine data. These exploratory results indicate the importance of quality control for training and, more broadly, that the relationship between degrees and types of training data noise and model performance is more complex than trade-offs between precision and recall.

Original languageEnglish
Title of host publicationProceedings of the 2nd ACM SIGSPATIAL International Workshop on AI for Geographic Knowledge Discovery, GeoAI 2018
EditorsSong Gao, Shawn Newsam, Yingjie Hu, Dalton Lunga
PublisherAssociation for Computing Machinery, Inc
Pages5-8
Number of pages4
ISBN (Electronic)9781450360364
DOIs
StatePublished - Nov 6 2018
Event2nd ACM SIGSPATIAL International Workshop on AI for Geographic Knowledge Discovery, GeoAI 2018 - Seattle, United States
Duration: Nov 6 2018Nov 6 2018

Publication series

NameProceedings of the 2nd ACM SIGSPATIAL International Workshop on AI for Geographic Knowledge Discovery, GeoAI 2018

Conference

Conference2nd ACM SIGSPATIAL International Workshop on AI for Geographic Knowledge Discovery, GeoAI 2018
Country/TerritoryUnited States
CitySeattle
Period11/6/1811/6/18

Keywords

  • Building detection
  • Convolutional neural networks
  • Remote sensing
  • Training data

Fingerprint

Dive into the research topics of 'How good is good enough? Quantifying the effects of training set quality'. Together they form a unique fingerprint.

Cite this