Parallel k-means clustering for quantitative ecoregion delineation using large data sets

Jitendra Kumar, Richard T. Mills, Forrest M. Hoffman, William W. Hargrove

Research output: Contribution to journalConference articlepeer-review

70 Scopus citations

Abstract

Identification of geographic ecoregions has long been of interest to environmental scientists and ecologists for identifying regions of similar ecological and environmental conditions. Such classifications are important for predicting suitable species ranges, for stratification of ecological samples, and to help prioritize habitat preservation and remediation efforts. Hargrove and Hoffman [1, 2] have developed geographical spatio-temporal clustering algorithms and codes and have successfully applied them to a variety of environmental science domains, including ecological regionalization; environmental monitoring network design; analysis of satellite-, airborne-, and ground-based remote sensing, and climate model-model and model-measurement intercomparison. With the advances in state-of-the-art satellite remote sensing and climate models, observations and model outputs are available at increasingly high spatial and temporal resolutions. Long time series of these high resolution datasets are extremely large in size and growing. Analysis and knowledge extraction from these large datasets are not just algorithmic and ecological problems, but also pose a complex computational problem. This paper focuses on the development of a massively parallel multivariate geographical spatio-temporal clustering code for analysis of very large datasets using tens of thousands processors on one of the fastest supercomputers in the world.

Original languageEnglish
Pages (from-to)1602-1611
Number of pages10
JournalProcedia Computer Science
Volume4
DOIs
StatePublished - 2011
Event11th International Conference on Computational Science, ICCS 2011 - Singapore, Singapore
Duration: Jun 1 2011Jun 3 2011

Funding

This research was partially sponsored by the U.S. Department of Agriculture, U.S. Forest Service, Eastern Forest Environmental Threat Assessment Center. This research used resources of the National Center for Computational Science at Oak Ridge National Laboratory, which is managed by UT-Battelle, LLC, for the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.

Keywords

  • Data mining
  • Ecoregionalization
  • High performance computing
  • K-means clustering

Fingerprint

Dive into the research topics of 'Parallel k-means clustering for quantitative ecoregion delineation using large data sets'. Together they form a unique fingerprint.

Cite this