Multivariate Spatio-Temporal Clustering (MSTC) as a data mining tool for environmental applications

Forrest M. Hoffman, William W. Hargrove, Richard T. Mills, Salil Mahajan, David J. Erickson, Robert J. Oglesby

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

19 Scopus citations

Abstract

The authors have applied multivariate cluster analysis to a variety of environmental science domains, including ecological regionalization; environmental monitoring network design; analysis of satellite-, airborne-, and ground-based remote sensing, and climate model-model and model-measurement intercomparison. The clustering methodology employs a k-means statistical clustering algorithm that has been implemented in a highly scalable, parallel high performance computing (HPC) application. Because of its efficiency and use of HPC platforms, the clustering code may be applied as a data mining tool to analyze and compare very large data sets of high dimensionality, such as very long or high frequency/resolution time series measurements or model output. The method was originally applied across geographic space and called Multivariate Geographic Clustering (MGC). Now applied across space and through time, the environmental data mining method is called Multivariate Spatio-Temporal Clustering (MSTC). Described here are the clustering algorithm, recent code improvements that significantly reduce the time-to-solution, and a new parallel principal components analysis (PCA) tool that can analyze very large data sets. Finally, a sampling of the authors' applications of MGC and MSTC to problems in the environmental sciences are presented.

Original languageEnglish
Title of host publication4th Biennial Meeting of International Congress on Environmental Modelling and Software
Subtitle of host publicationIntegrating Sciences and Information Technology for Environmental Assessment and Decision Making, iEMSs 2008
Pages1774-1781
Number of pages8
StatePublished - 2008
Event4th Biennial Meeting of International Congress on Environmental Modelling and Software: Integrating Sciences and Information Technology for Environmental Assessment and Decision Making, iEMSs 2008 - Barcelona, Catalonia, Spain
Duration: Jul 7 2008Jul 10 2008

Publication series

NameProc. iEMSs 4th Biennial Meeting - Int. Congress on Environmental Modelling and Software: Integrating Sciences and Information Technology for Environmental Assessment and Decision Making, iEMSs 2008
Volume3

Conference

Conference4th Biennial Meeting of International Congress on Environmental Modelling and Software: Integrating Sciences and Information Technology for Environmental Assessment and Decision Making, iEMSs 2008
Country/TerritorySpain
CityBarcelona, Catalonia
Period07/7/0807/10/08

Keywords

  • Cluster analysis
  • Ecoregions
  • General circulation models
  • Geospatial data
  • Parallel computing

Fingerprint

Dive into the research topics of 'Multivariate Spatio-Temporal Clustering (MSTC) as a data mining tool for environmental applications'. Together they form a unique fingerprint.

Cite this