TY - JOUR
T1 - A hybrid classification scheme for mining multisource geospatial data
AU - Vatsavai, Ranga Raju
AU - Bhaduri, Budhendra
PY - 2011/1
Y1 - 2011/1
N2 - Supervised learning methods such as Maximum Likelihood (ML) are often used in land cover (thematic) classification of remote sensing imagery. ML classifier relies exclusively on spectral characteristics of thematic classes whose statistical distributions (class conditional probability densities) are often overlapping. The spectral response distributions of thematic classes are dependent on many factors including elevation, soil types, and ecological zones. A second problem with statistical classifiers is the requirement of the large number of accurate training samples (10 to 30 × {pipe}dimensions{pipe}), which are often costly and time consuming to acquire over large geographic regions. With the increasing availability of geospatial databases, it is possible to exploit the knowledge derived from these ancillary datasets to improve classification accuracies even when the class distributions are highly overlapping. Likewise newer semi-supervised techniques can be adopted to improve the parameter estimates of the statistical model by utilizing a large number of easily available unlabeled training samples. Unfortunately, there is no convenient multivariate statistical model that can be employed for multisource geospatial databases. In this paper we present a hybrid semi-supervised learning algorithm that effectively exploits freely available unlabeled training samples from multispectral remote sensing images and also incorporates ancillary geospatial databases. We have conducted several experiments on Landsat satellite image datasets, and our new hybrid approach shows over 24% to 36% improvement in overall classification accuracy over conventional classification schemes.
AB - Supervised learning methods such as Maximum Likelihood (ML) are often used in land cover (thematic) classification of remote sensing imagery. ML classifier relies exclusively on spectral characteristics of thematic classes whose statistical distributions (class conditional probability densities) are often overlapping. The spectral response distributions of thematic classes are dependent on many factors including elevation, soil types, and ecological zones. A second problem with statistical classifiers is the requirement of the large number of accurate training samples (10 to 30 × {pipe}dimensions{pipe}), which are often costly and time consuming to acquire over large geographic regions. With the increasing availability of geospatial databases, it is possible to exploit the knowledge derived from these ancillary datasets to improve classification accuracies even when the class distributions are highly overlapping. Likewise newer semi-supervised techniques can be adopted to improve the parameter estimates of the statistical model by utilizing a large number of easily available unlabeled training samples. Unfortunately, there is no convenient multivariate statistical model that can be employed for multisource geospatial databases. In this paper we present a hybrid semi-supervised learning algorithm that effectively exploits freely available unlabeled training samples from multispectral remote sensing images and also incorporates ancillary geospatial databases. We have conducted several experiments on Landsat satellite image datasets, and our new hybrid approach shows over 24% to 36% improvement in overall classification accuracy over conventional classification schemes.
KW - EM
KW - MLC
KW - Semi-supervised learning
UR - http://www.scopus.com/inward/record.url?scp=78751575653&partnerID=8YFLogxK
U2 - 10.1007/s10707-010-0113-4
DO - 10.1007/s10707-010-0113-4
M3 - Article
AN - SCOPUS:78751575653
SN - 1384-6175
VL - 15
SP - 29
EP - 47
JO - GeoInformatica
JF - GeoInformatica
IS - 1
ER -