Multivariate Testing of Sampling Techniques to Address Class Imbalance in Building Use Type Classification

Daniel S. Adams, Taylor Hauser, H. Lexie Yang, Peter Li

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

This study addresses the challenges inherent in building use type classification, particularly focusing on the issue of class imbalance in the training datasets for machine learning classifiers. We comprehensively analyze the efficacy of various class-balancing sampling techniques. Employing Monte Carlo simulations and Bayesian optimization, we evaluated the performance of multiple sampling methods, including Random Oversampling, Random Undersampling, SMOTE, Borderline-SMOTE, and ADASYN, across a dataset encompassing nine southeastern coastal states of the United States. Our findings reveal that simple random over-and undersampling techniques outperform more sophisticated methods. Additionally, we show inherent value in creating an imbalance in training data to effectively train a machine learning classifier for distinguishing between residential and nonresidential buildings. This study provides valuable guidance for future research on building use type classification research and lays essential groundwork for developing attribute-rich building stock datasets.

Original languageEnglish
Title of host publicationGeoAI 2024 - Proceedings of the 7th ACM SIGSPATIAL International Workshop on AI for Geographic Knowledge Discovery
EditorsSong Gao, Gengchen Mai, Shawn Newsam, Lexie Yang, Dalton Lunga, Di Zhu, Bruno Martins, Samantha Arundel
PublisherAssociation for Computing Machinery, Inc
Pages15-26
Number of pages12
ISBN (Electronic)9798400711763
DOIs
StatePublished - Nov 18 2024
Event7th ACM SIGSPATIAL International Workshop on AI for Geographic Knowledge Discovery, GeoAI 2024 - Atlanta, United States
Duration: Oct 29 2024 → …

Publication series

NameGeoAI 2024 - Proceedings of the 7th ACM SIGSPATIAL International Workshop on AI for Geographic Knowledge Discovery

Conference

Conference7th ACM SIGSPATIAL International Workshop on AI for Geographic Knowledge Discovery, GeoAI 2024
Country/TerritoryUnited States
CityAtlanta
Period10/29/24 → …

Funding

This manuscript has been authored by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the US Department of Energy (DOE). The US government retains and the publisher, by accepting the article for publication, acknowledges that the US government retains a nonexclusive, paid up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for US government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-publicaccess-plan).

Keywords

  • Building Use Type
  • Class Imbalance
  • Machine Learning
  • Monte Carlo
  • Sampling Techniques

Fingerprint

Dive into the research topics of 'Multivariate Testing of Sampling Techniques to Address Class Imbalance in Building Use Type Classification'. Together they form a unique fingerprint.

Cite this