SARS-CoV2 billion-compound docking

David M. Rogers, Rupesh Agarwal, Josh V. Vermaas, Micholas Dean Smith, Rajitha T. Rajeshwar, Connor Cooper, Ada Sedova, Swen Boehm, Matthew Baker, Jens Glaser, Jeremy C. Smith

Research output: Contribution to journalArticlepeer-review

10 Scopus citations

Abstract

This dataset contains ligand conformations and docking scores for 1.4 billion molecules docked against 6 structural targets from SARS-CoV2, representing 5 unique proteins: MPro, NSP15, PLPro, RDRP, and the Spike protein. Docking was carried out using the AutoDock-GPU platform on the Summit supercomputer and Google Cloud. The docking procedure employed the Solis Wets search method to generate 20 independent ligand binding poses per compound. Each compound geometry was scored using the AutoDock free energy estimate, and rescored using RFScore v3 and DUD-E machine-learned rescoring models. Input protein structures are included, suitable for use by AutoDock-GPU and other docking programs. As the result of an exceptionally large docking campaign, this dataset represents a valuable resource for discovering trends across small molecule and protein binding sites, training AI models, and comparing to inhibitor compounds targeting SARS-CoV-2. The work also gives an example of how to organize and process data from ultra-large docking screens.

Original languageEnglish
Article number173
JournalScientific Data
Volume10
Issue number1
DOIs
StatePublished - Dec 2023

Funding

This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. Computational work was supported by the Covid HPC Consortium and included allocations on both Oak Ridge Leadership Computing Facility and Google Cloud Compute services.

Fingerprint

Dive into the research topics of 'SARS-CoV2 billion-compound docking'. Together they form a unique fingerprint.

Cite this