A parallel EM algorithm for model-based clustering applied to the exploration of large spatio-temporal data

Wei Chen Chen, George Ostrouchov, David Pugmire, Prabhat, Michael Wehner

Research output: Contribution to journalArticlepeer-review

11 Scopus citations

Abstract

We develop a parallel expectation-maximization (EM) algorithm for multivariate Gaussian mixture models and use it to perform model-based clustering of a large climate dataset. Three variants of the EM algorithm are reformulated in parallel and a new variant that is faster is presented. All are implemented using the single program, multiple data programming model, which is able to take advantage of the combined collective memory of large distributed computer architectures to process larger datasets. Displays of the estimated mixture model rather than the data allow us to explore multivariate relationships in a way that scales to arbitrary size data. We study the performance of our methodology on simulated data and apply our methodology to a high-resolution climate dataset produced by the community atmosphere model (CAM5). This article has supplementary material online.

Original languageEnglish
Pages (from-to)513-523
Number of pages11
JournalTechnometrics
Volume55
Issue number4
DOIs
StatePublished - Nov 1 2013

Funding

We sincerely thank the editor, an associate editor and two reviewers for providing many insightful comments and suggestions which substantially improved this article. Work at LBNL was supported by the Regional and Global Climate Modeling Program of the Office of Biological and Environmental Research in the Department of Energy Office of Science under contract number DE-AC02-05CH11231. This research also used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.

FundersFunder number
Office of Biological and Environmental Research in the Department of Energy Office of ScienceDE-AC02-05CH11231
U.S. Department of EnergyDE-AC05-00OR22725
Office of Science

    Keywords

    • Parallel computing
    • Parallel coordinate plot
    • Spatial time series
    • Unsupervised learning

    Fingerprint

    Dive into the research topics of 'A parallel EM algorithm for model-based clustering applied to the exploration of large spatio-temporal data'. Together they form a unique fingerprint.

    Cite this