TY - GEN
T1 - Multivariate Spatio-Temporal Clustering (MSTC) as a data mining tool for environmental applications
AU - Hoffman, Forrest M.
AU - Hargrove, William W.
AU - Mills, Richard T.
AU - Mahajan, Salil
AU - Erickson, David J.
AU - Oglesby, Robert J.
PY - 2008
Y1 - 2008
N2 - The authors have applied multivariate cluster analysis to a variety of environmental science domains, including ecological regionalization; environmental monitoring network design; analysis of satellite-, airborne-, and ground-based remote sensing, and climate model-model and model-measurement intercomparison. The clustering methodology employs a k-means statistical clustering algorithm that has been implemented in a highly scalable, parallel high performance computing (HPC) application. Because of its efficiency and use of HPC platforms, the clustering code may be applied as a data mining tool to analyze and compare very large data sets of high dimensionality, such as very long or high frequency/resolution time series measurements or model output. The method was originally applied across geographic space and called Multivariate Geographic Clustering (MGC). Now applied across space and through time, the environmental data mining method is called Multivariate Spatio-Temporal Clustering (MSTC). Described here are the clustering algorithm, recent code improvements that significantly reduce the time-to-solution, and a new parallel principal components analysis (PCA) tool that can analyze very large data sets. Finally, a sampling of the authors' applications of MGC and MSTC to problems in the environmental sciences are presented.
AB - The authors have applied multivariate cluster analysis to a variety of environmental science domains, including ecological regionalization; environmental monitoring network design; analysis of satellite-, airborne-, and ground-based remote sensing, and climate model-model and model-measurement intercomparison. The clustering methodology employs a k-means statistical clustering algorithm that has been implemented in a highly scalable, parallel high performance computing (HPC) application. Because of its efficiency and use of HPC platforms, the clustering code may be applied as a data mining tool to analyze and compare very large data sets of high dimensionality, such as very long or high frequency/resolution time series measurements or model output. The method was originally applied across geographic space and called Multivariate Geographic Clustering (MGC). Now applied across space and through time, the environmental data mining method is called Multivariate Spatio-Temporal Clustering (MSTC). Described here are the clustering algorithm, recent code improvements that significantly reduce the time-to-solution, and a new parallel principal components analysis (PCA) tool that can analyze very large data sets. Finally, a sampling of the authors' applications of MGC and MSTC to problems in the environmental sciences are presented.
KW - Cluster analysis
KW - Ecoregions
KW - General circulation models
KW - Geospatial data
KW - Parallel computing
UR - http://www.scopus.com/inward/record.url?scp=79958263021&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:79958263021
SN - 9788476530740
T3 - Proc. iEMSs 4th Biennial Meeting - Int. Congress on Environmental Modelling and Software: Integrating Sciences and Information Technology for Environmental Assessment and Decision Making, iEMSs 2008
SP - 1774
EP - 1781
BT - 4th Biennial Meeting of International Congress on Environmental Modelling and Software
T2 - 4th Biennial Meeting of International Congress on Environmental Modelling and Software: Integrating Sciences and Information Technology for Environmental Assessment and Decision Making, iEMSs 2008
Y2 - 7 July 2008 through 10 July 2008
ER -