Abstract
We are motivated by newly proposed methods for mining large-scale corpora of scholarly publications (e.g., full biomedical literature), which consists of tens of millions of papers spanning decades of research. In this setting, analysts seek to discover relationships among concepts. They construct graph representations from annotated text databases and then formulate the relationship-mining problem as an all-pairs shortest paths (APSP) and validate connective paths against curated biomedical knowledge graphs (e.g., Spoke). In this context, we present Coast (Exascale Communication-Optimized All-Pairs Shortest Path) and demonstrate 1.004 EF/s on 9,200 Frontier nodes (73,600 GCDs). We develop hyperbolic performance models (HYPERMOD), which guide optimizations and parametric tuning. The proposed Coast algorithm achieved the memory constant parallel efficiency of 99% in the single-precision tropical semiring. Looking forward, Coast will enable the integration of scholarly corpora like PubMed into the Spoke biomedical knowledge graph.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of SC 2022 |
| Subtitle of host publication | International Conference for High Performance Computing, Networking, Storage and Analysis |
| Publisher | IEEE Computer Society |
| ISBN (Electronic) | 9781665454445 |
| DOIs | |
| State | Published - 2022 |
| Event | 2022 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2022 - Dallas, United States Duration: Nov 13 2022 → Nov 18 2022 |
Publication series
| Name | International Conference for High Performance Computing, Networking, Storage and Analysis, SC |
|---|---|
| Volume | 2022-November |
| ISSN (Print) | 2167-4329 |
| ISSN (Electronic) | 2167-4337 |
Conference
| Conference | 2022 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2022 |
|---|---|
| Country/Territory | United States |
| City | Dallas |
| Period | 11/13/22 → 11/18/22 |
Funding
This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the US Department of Energy. This research used resources of the Oak Ridge Leadership Computing Facility, which is a DOE Office of Science User Facility. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/ downloads/doe-public-access-plan). 1https://spoke.ucsf.edu This material is based upon work supported by the US Department of Energy (DOE), Office of Science, Office of Advanced Scientific Computing Research (Robinson Pino, program manager) under contract DE-AC05-00OR22725 and by the National Science Foundation (NSF) under award number 1710371. SPOKE development was funded in substantial part by the NSF Convergence Accelerator awards 1937160 and 12033569. This research used resources of the OLCF which is a DOE Office of Science User Facility supported under contract DE-AC05-00OR22725.
Keywords
- High-Performance Computing
- Parallel Algorithms
- Shortest Path Problem