Abstract
We are motivated by newly proposed methods for data mining large-scale corpora of scholarly publications, such as the full biomedical literature, which may consist of tens of millions of papers spanning decades of research. In this setting, analysts seek to discover how concepts relate to one another. They construct graph representations from annotated text databases and then formulate the relationship-mining problem as one of computing all-pairs shortest paths (APSP), which becomes a significant bottleneck. In this context, we present a new high-performance algorithm and implementation of the Floyd-Warshall algorithm for distributed-memory parallel computers accelerated by GPUs, which we call DSNAPSHOT (Distributed Accelerated Semiring All-Pairs Shortest Path). For our largest experiments, we ran DSNAPSHOT on a connected input graph with millions of vertices using 4, 096nodes (24,576GPUs) of the Oak Ridge National Laboratory's Summit supercomputer system. We find DSNAPSHOT achieves a sustained performance of ;136× 10{15}; floating-point operations per second (136petaflop/s) at a parallel efficiency of 90% under weak scaling and, in absolute speed, 70% of the best possible performance given our computation (in the single-precision tropical semiring or 'min-plus' algebra). Looking forward, we believe this novel capability will enable the mining of scholarly knowledge corpora when embedded and integrated into artificial intelligence-driven natural language processing workflows at scale.
Original language | English |
---|---|
Title of host publication | Proceedings of SC 2020 |
Subtitle of host publication | International Conference for High Performance Computing, Networking, Storage and Analysis |
Publisher | IEEE Computer Society |
ISBN (Electronic) | 9781728199986 |
DOIs | |
State | Published - Nov 2020 |
Event | 2020 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2020 - Virtual, Atlanta, United States Duration: Nov 9 2020 → Nov 19 2020 |
Publication series
Name | International Conference for High Performance Computing, Networking, Storage and Analysis, SC |
---|---|
Volume | 2020-November |
ISSN (Print) | 2167-4329 |
ISSN (Electronic) | 2167-4337 |
Conference
Conference | 2020 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2020 |
---|---|
Country/Territory | United States |
City | Virtual, Atlanta |
Period | 11/9/20 → 11/19/20 |
Funding
This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. This research used resources of the Oak Ridge Leadership Computing Facility, which is a DOE Office of Science User Facility. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/ downloads/doe-public-access-plan). VIII. ACKNOWLEDGEMENTS This material is based upon work supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, Robinson Pino, program manager, under contract number DE-AC05-00OR22725, as well as by the National Science Foundation under Grant No. 1710371. This research used resources of the Oak Ridge Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC05-00OR22725. This material is based upon work supported by the U.S. National Science Foundation (NSF) Award Numbers 1533768 and 1710371. We would like to thank Dr. Oded Green for his help in analysing the performance of our GPU kernels. REFERENCES [1] E. Landhuis, “Scientific literature: information overload,” Nature, vol. 535, no. 7612, pp. 457–458, 2016.
Funders | Funder number |
---|---|
U.S. Department of Energy | |
Office of Science | |
Advanced Scientific Computing Research | DE-AC05-00OR22725 |
U.S. Department of Energy | |
National Science Foundation | 1710371 |
National Science Foundation | 1533768 |
Office of Science |
Keywords
- High Performance Computing
- Parallel Algorithms
- Shortest path problem
Fingerprint
Dive into the research topics of 'Scalable knowledge graph analytics at 136 petaflop/s'. Together they form a unique fingerprint.Datasets
-
Scalable Knowledge-Graph Analytics at 136 Petaflop/s
Herrmannova, D. (Creator), Kannan, R. {. (Creator), Sao, P. (Creator), Lu, H. (Creator), Potok, T. (Creator), Thakkar, V. (Creator) & Vuduc, R. (Creator), Constellation by Oak Ridge Leadership Computing Facility (OLCF), Aug 7 2020
Dataset