High-throughput virtual laboratory for drug discovery using massive datasets

Jens Glaser, Josh V. Vermaas, David M. Rogers, Jeff Larkin, Scott LeGrand, Swen Boehm, Matthew B. Baker, Aaron Scheinberg, Andreas F. Tillack, Mathialakan Thavappiragasam, Ada Sedova, Oscar Hernandez

Research output: Contribution to journalArticlepeer-review

31 Scopus citations

Abstract

Time-to-solution for structure-based screening of massive chemical databases for COVID-19 drug discovery has been decreased by an order of magnitude, and a virtual laboratory has been deployed at scale on up to 27,612 GPUs on the Summit supercomputer, allowing an average molecular docking of 19,028 compounds per second. Over one billion compounds were docked to two SARS-CoV-2 protein structures with full optimization of ligand position and 20 poses per docking, each in under 24 hours. GPU acceleration and high-throughput optimizations of the docking program produced 350× mean speedup over the CPU version (50× speedup per node). GPU acceleration of both feature calculation for machine-learning based scoring and distributed database queries reduced processing of the 2.4 TB output by orders of magnitude. The resulting 50× speedup for the full pipeline reduces an initial 43 day runtime to 21 hours per protein for providing high-scoring compounds to experimental collaborators for validation assays.

Original languageEnglish
Pages (from-to)452-468
Number of pages17
JournalInternational Journal of High Performance Computing Applications
Volume35
Issue number5
DOIs
StatePublished - Sep 2021

Funding

The authors thank Rupesh Agarwal for extensive support and assistance with input preparation and scientific and methodological discussions, Jeremy C. Smith and his group for support and inspiration. We also thank Duncan Poole, Geetika Gupta, Jon Lefman, and the rest of the NVIDIA team as well as Diogo Santos-Martins and the Scripps Research team for support and coordination. We thank developers of BlazingSQL, Rodrigo Aramburu, William Malpica, and Felipe Aramburu, and NVIDIA RAPIDS for their support in deploying their database solutions to Summit. We thank Jason Kincl for help with the deployment of FireWorks on Marble. We thank Omar Demerdash for initial exploration of rescoring methods. We thank Jamey Kinney, Usman Qureshi, and Miles Euell (Google). The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was sponsored by the Laboratory Directed Research and Development Program at Oak Ridge National Laboratory (ORNL), which is managed by UT-Battelle, LLC, for the U.S. Department of Energy (DOE) under Contract No. DE-AC05-00OR22725. This work also used resources, services, and support provided via the COVID-19 HPC Consortium (https://covid19-hpc-consortium.org/), which is a unique private-public effort to bring together government, industry, and academic leaders who are volunteering free compute time and resources in support of COVID-19 research, and used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. This manuscript has been co-authored by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the US Department of Energy (DOE). The US government retains and the publisher, by accepting the article for publication, acknowledges that the US government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for US government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan ( http://energy.gov/downloads/doe-public-access-plan ). The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was sponsored by the Laboratory Directed Research and Development Program at Oak Ridge National Laboratory (ORNL), which is managed by UT-Battelle, LLC, for the U.S. Department of Energy (DOE) under Contract No. DE-AC05-00OR22725. This work also used resources, services, and support provided via the COVID-19 HPC Consortium ( https://covid19-hpc-consortium.org/ ), which is a unique private-public effort to bring together government, industry, and academic leaders who are volunteering free compute time and resources in support of COVID-19 research, and used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.

Keywords

  • GPU acceleration
  • High-throughput virtual screening
  • drug discovery
  • high-performance database query

Fingerprint

Dive into the research topics of 'High-throughput virtual laboratory for drug discovery using massive datasets'. Together they form a unique fingerprint.

Cite this