Abstract
The explosion in the volumes of data generated from everlarger simulation campaigns and experiments or observations necessitates competent tools for data wrangling and analysis). While the Oak Ridge Leadership Computing Facility (OLCF) provides a variety of tools to perform data wrangling and data analysis tasks, Python based tools often lack scalability, or the ability to fully exploit the computational capability of OLCF’s Summit supercomputer. NVIDIA RAPIDS and Dask offer a promising solution to accelerate and distribute data analytics workloads from personal computers to heterogeneous supercomputing systems. We discuss early performance evaluation results of RAPIDS and Dask on Summit to understand their capabilities, scalability, and limitations. Our evaluation includes a subset of RAPIDS libraries, i.e., cuDF, cuML, and cuGraph, and Chainer’s CuPy, and their multi-GPU variants when available. We also draw on the observed trends from the performance evaluation results to discuss best practices for maximizing performance.
| Original language | English |
|---|---|
| Title of host publication | Driving Scientific and Engineering Discoveries Through the Convergence of HPC, Big Data and AI - 17th Smoky Mountains Computational Sciences and Engineering Conference, SMC 2020, Revised Selected Papers |
| Editors | Jeffrey Nichols, Arthur ‘Barney’ Maccabe, Suzanne Parete-Koon, Becky Verastegui, Oscar Hernandez, Theresa Ahearn |
| Publisher | Springer Science and Business Media Deutschland GmbH |
| Pages | 366-380 |
| Number of pages | 15 |
| ISBN (Print) | 9783030633929 |
| DOIs | |
| State | Published - 2021 |
| Event | 17th Smoky Mountains Computational Sciences and Engineering Conference, SMC 2020 - Virtual, Online Duration: Aug 26 2020 → Aug 28 2020 |
Publication series
| Name | Communications in Computer and Information Science |
|---|---|
| Volume | 1315 CCIS |
| ISSN (Print) | 1865-0929 |
| ISSN (Electronic) | 1865-0937 |
Conference
| Conference | 17th Smoky Mountains Computational Sciences and Engineering Conference, SMC 2020 |
|---|---|
| City | Virtual, Online |
| Period | 08/26/20 → 08/28/20 |
Funding
B. Hernández et al.—Contributed Equally. This manuscript has been co-authored by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the US Department of Energy (DOE). The US government retains and the publisher, by accepting the article for publication, acknowledges that the US government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for US government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy. gov/downloads/doe-public-access-plan). Acknowledgments. This research used resources of the Oak Ridge Leadership Computing Facility (OLCF) at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.
Keywords
- Data analytics
- GPU
- Multi-threaded
- Performance evaluation
- Python