Abstract
A proliferation of data from vast networks of remote sensing platforms (satellites, unmanned aircraft systems (UAS), airborne etc.), observational facilities (meteorological, eddy covariance etc.), state-of-The-Art sensors, and simulation models offer unprecedented opportunities for scientific discovery. Unsupervised classification is a widely applied data mining approach to derive insights from such data. However, classification of very large data sets is a complex computational problem that requires efficient numerical algorithms and implementations on high performance computing (HPC) platforms. Additionally, increasing power, space, cooling and efficiency requirements has led to the deployment of hybrid supercomputing platforms with complex architectures and memory hierarchies like the Titan system at Oak Ridge National Laboratory. The advent of such accelerated computing architectures offers new challenges and opportunities for big data analytics in general and specifically, large scale cluster analysis in our case. Although there is an existing body of work on parallel cluster analysis, those approaches do not fully meet the needs imposed by the nature and size of our large data sets. Moreover, they had scaling limitations and were mostly limited to traditional distributed memory computing platforms. We present a parallel Multivariate Spatio-Temporal Clustering (MSTC) technique based on k-means cluster analysis that can target hybrid supercomputers like Titan. We developed a hybrid MPI, CUDA and OpenACC implementation that can utilize both CPU and GPU resources on computational nodes. We describe performance results on Titan that demonstrate the scalability and efficacy of our approach in processing large ecological data sets.
Original language | English |
---|---|
Title of host publication | Proceedings - 2017 IEEE International Conference on Cluster Computing, CLUSTER 2017 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 267-277 |
Number of pages | 11 |
ISBN (Electronic) | 9781538623268 |
DOIs | |
State | Published - Sep 22 2017 |
Event | 2017 IEEE International Conference on Cluster Computing, CLUSTER 2017 - Honolulu, United States Duration: Sep 5 2017 → Sep 8 2017 |
Publication series
Name | Proceedings - IEEE International Conference on Cluster Computing, ICCC |
---|---|
Volume | 2017-September |
ISSN (Print) | 1552-5244 |
Conference
Conference | 2017 IEEE International Conference on Cluster Computing, CLUSTER 2017 |
---|---|
Country/Territory | United States |
City | Honolulu |
Period | 09/5/17 → 09/8/17 |
Funding
This manuscript has been authored by UT-Battelle,LLC under Contract No. DE-AC05-00OR22725with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan(http://energy.gov/downloads/doe-public-access-plan).
Keywords
- Big data analytics
- GPU application
- Hybrid supercomputing
- Parallel k-means clustering