Abstract
This study addresses the challenges inherent in building use type classification, particularly focusing on the issue of class imbalance in the training datasets for machine learning classifiers. We comprehensively analyze the efficacy of various class-balancing sampling techniques. Employing Monte Carlo simulations and Bayesian optimization, we evaluated the performance of multiple sampling methods, including Random Oversampling, Random Undersampling, SMOTE, Borderline-SMOTE, and ADASYN, across a dataset encompassing nine southeastern coastal states of the United States. Our findings reveal that simple random over-and undersampling techniques outperform more sophisticated methods. Additionally, we show inherent value in creating an imbalance in training data to effectively train a machine learning classifier for distinguishing between residential and nonresidential buildings. This study provides valuable guidance for future research on building use type classification research and lays essential groundwork for developing attribute-rich building stock datasets.
| Original language | English |
|---|---|
| Title of host publication | GeoAI 2024 - Proceedings of the 7th ACM SIGSPATIAL International Workshop on AI for Geographic Knowledge Discovery |
| Editors | Song Gao, Gengchen Mai, Shawn Newsam, Lexie Yang, Dalton Lunga, Di Zhu, Bruno Martins, Samantha Arundel |
| Publisher | Association for Computing Machinery, Inc |
| Pages | 15-26 |
| Number of pages | 12 |
| ISBN (Electronic) | 9798400711763 |
| DOIs | |
| State | Published - Nov 18 2024 |
| Event | 7th ACM SIGSPATIAL International Workshop on AI for Geographic Knowledge Discovery, GeoAI 2024 - Atlanta, United States Duration: Oct 29 2024 → … |
Publication series
| Name | GeoAI 2024 - Proceedings of the 7th ACM SIGSPATIAL International Workshop on AI for Geographic Knowledge Discovery |
|---|
Conference
| Conference | 7th ACM SIGSPATIAL International Workshop on AI for Geographic Knowledge Discovery, GeoAI 2024 |
|---|---|
| Country/Territory | United States |
| City | Atlanta |
| Period | 10/29/24 → … |
Funding
This manuscript has been authored by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the US Department of Energy (DOE). The US government retains and the publisher, by accepting the article for publication, acknowledges that the US government retains a nonexclusive, paid up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for US government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-publicaccess-plan).
Keywords
- Building Use Type
- Class Imbalance
- Machine Learning
- Monte Carlo
- Sampling Techniques