Abstract
The increasing availability of high-resolution geospatiotemporal data sets from sources such as observatory networks, remote sensing platforms, and computational Earth system models has opened new possibilities for knowledge discovery and mining of weather, climate, ecological, and other geoscientific data sets fused from disparate sources. Many of the standard tools used on individual workstations are impractical for the analysis and synthesis of data sets of this size; however, new algorithmic approaches that can effectively utilize the complex memory hierarchies and the extremely high levels of parallelism available in state-of-the-art high-performance computing platforms can enable such analysis. Here, we describe pKluster, an open-source tool we have developed for accelerated k-means clustering of geospatial and geospatiotemporal data, and discuss algorithmic modifications and code optimizations we have made to enable it to effectively use parallel machines based on novel CPU architectures-such as the Intel Knights Landing Xeon Phi and Skylake Xeon processors-with many cores and hardware threads, and employing significant single instruction, multiple data (SIMD) parallelism. We outline some applications of the code in ecology and climate science contexts and present a detailed discussion of the performance of the code for one such application, LiDAR-derived vertical vegetation structure classification.
Original language | English |
---|---|
Title of host publication | Proceedings - 18th IEEE International Conference on Data Mining Workshops, ICDMW 2018 |
Editors | Hanghang Tong, Zhenhui Li, Feida Zhu, Jeffrey Yu |
Publisher | IEEE Computer Society |
Pages | 787-794 |
Number of pages | 8 |
ISBN (Electronic) | 9781538692882 |
DOIs | |
State | Published - Jul 2 2018 |
Event | 18th IEEE International Conference on Data Mining Workshops, ICDMW 2018 - Singapore, Singapore Duration: Nov 17 2018 → Nov 20 2018 |
Publication series
Name | IEEE International Conference on Data Mining Workshops, ICDMW |
---|---|
Volume | 2018-November |
ISSN (Print) | 2375-9232 |
ISSN (Electronic) | 2375-9259 |
Conference
Conference | 18th IEEE International Conference on Data Mining Workshops, ICDMW 2018 |
---|---|
Country/Territory | Singapore |
City | Singapore |
Period | 11/17/18 → 11/20/18 |
Bibliographical note
Publisher Copyright:© 2018 IEEE.
Funding
R. T. Mills was supported by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration. JK and FMH were partially supported by the Next Generation Ecosystem Experiments -Arctic (NGEE Arctic) project, which is sponsored by the Terrestrial Ecosystem Sciences (TES) Program, and the Reducing Uncertainties in Biogeochemical Interactions through Synthesis and Computation Scientific Focus Area (RUBISCO SFA), which is sponsored by the Regional and Global Model Analysis (RGMA) Program. The TES and RGMA Programs are activities of the Climate and Environmental Sciences Division (CESD) of the Office of Biological and Environmental Research (BER) in the U.S. Department of Energy Office of Science. WWH, JK, and FMH claim additional support from the Eastern Forest Environmental Threat Assessment Center (EFETAC) in the U.S. Department of Agriculture Forest Service. This manuscript has been authored by UChicago Argonne, LLC under Contract No. DE-AC02-06CH11357 with the U.S. Department of Energy. This manuscript has been co-authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725with the U.S. Department of Energy.
Funders | Funder number |
---|---|
Eastern Forest Environmental Threat Assessment Center | |
U.S. Department of Agriculture Forest Service | |
U.S. Department of Energy Office of Science | |
UT-Battelle, LLC | |
National Nuclear Security Administration |
Keywords
- clustering
- geospatial
- high performance computing
- k-means
- manycore