Abstract
The convergence of data-intensive and extreme-scale computing behooves an integrated software and data ecosystem for scientific discovery. Developments in this realm will fuel transformative research in data-driven interdisciplinary domains. Geocomputation provides computing paradigms in Geographic Information Systems (GIS) for interactive computing of geographic data, processes, models, and maps. Because GIS is data-driven, the computational scalability of a geocomputation workflow is directly related to the scale of the GIS data layers, their resolution and extent, as well as the velocity of the geo-located data streams to be processed. Geocomputation applications, which have high user interactivity and low end-to-end latency requirements, will dramatically benefit from the convergence of high-end data analytics (HDA) and high-performance computing (HPC). In an application, we must identify and eliminate computational bottlenecks that arise in a geocomputation workflow. Indeed, poor scalability at any of the workflow components is detrimental to the entire end-to-end pipeline. Here, we study a large geocomputation use case in flood inundation mapping that handles multiple national-scale geospatial datasets and targets low end-to-end latency. We discuss the benefits and challenges for harnessing both HDA and HPC for data-intensive geospatial data processing and intensive numerical modeling of geographic processes. We propose an HDA+HPC geocomputation architecture design that couples HDA (e.g., Spark)-based spatial data handling and HPC-based parallel data modeling. Key techniques for coupling HDA and HPC to bridge the two different software stacks are reviewed and discussed.
| Original language | English |
|---|---|
| Title of host publication | Driving Scientific and Engineering Discoveries Through the Convergence of HPC, Big Data and AI - 17th Smoky Mountains Computational Sciences and Engineering Conference, SMC 2020, Revised Selected Papers |
| Editors | Jeffrey Nichols, Arthur ‘Barney’ Maccabe, Suzanne Parete-Koon, Becky Verastegui, Oscar Hernandez, Theresa Ahearn |
| Publisher | Springer Science and Business Media Deutschland GmbH |
| Pages | 130-144 |
| Number of pages | 15 |
| ISBN (Print) | 9783030633929 |
| DOIs | |
| State | Published - 2021 |
| Event | 17th Smoky Mountains Computational Sciences and Engineering Conference, SMC 2020 - Virtual, Online Duration: Aug 26 2020 → Aug 28 2020 |
Publication series
| Name | Communications in Computer and Information Science |
|---|---|
| Volume | 1315 CCIS |
| ISSN (Print) | 1865-0929 |
| ISSN (Electronic) | 1865-0937 |
Conference
| Conference | 17th Smoky Mountains Computational Sciences and Engineering Conference, SMC 2020 |
|---|---|
| City | Virtual, Online |
| Period | 08/26/20 → 08/28/20 |
Funding
Acknowledgements. Liu’s work is partly supported by the Laboratory Directed Research and Development Program of Oak Ridge National Laboratory (ORNL), managed by UT-Battelle, LLC, for the US Department of Energy under contract DE-AC05-00OR22725. This research used resources of the Compute and Data Environment for Science (CADES) at ORNL, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. The data registration and publishing used the Constellation Data Portal, a feature in the Scalable Data Infrastructure for Science (SDIS) at the Oak Ridge Leadership Computing Facility (OLCF) in ORNL. Y. Y. Liu and J. Sanyal—Contributed Equally. This manuscript has been co-authored by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the US Department of Energy (DOE). The US government retains and the publisher, by accepting the article for publication, acknowledges that the US government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for US government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy. gov/downloads/doe-public-access-plan).
Keywords
- Data science
- Geocomputation
- High-performance computing
Fingerprint
Dive into the research topics of 'Scalable data-intensive geocomputation: A design for real-time continental flood inundation mapping'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver