Proteome-scale Deployment of Protein Structure Prediction Workflows on the Summit Supercomputer

Mu Gao, Mark Coletti, Russell B. Davidson, Ryan Prout, Subil Abraham, Benjamin Hernandez, Ada Sedova

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

6 Scopus citations

Abstract

Deep learning has contributed to major advances in the prediction of protein structure from sequence, a fundamental problem in structural bioinformatics. With predictions now approaching the accuracy of crystallographic experiments, and with accelerators like GPUs and TPUs making inference using large models rapid, genome-level structure prediction becomes an obvious aim. Leadership-class computing resources can be used to perform genome-scale protein structure prediction using state-of-the-art deep learning models, providing a wealth of new data for systems biology applications. Here we describe our efforts to efficiently deploy the AlphaFold v.2 program, for full-proteome structure prediction, at scale on the Oak Ridge Leadership Computing Facility's resources, including the Summit supercomputer. We performed inference to produce the predicted structures for 40,526 protein sequences, corresponding to four prokaryotic proteomes and one plant proteome, using under 4,400 total Summit node hours, equivalent to using the majority of the supercomputer for a little over one hour. We also designed an optimized structure refinement that reduced the time for the relaxation stage of the AlphaFold pipeline by over 10X for longer sequences. We demonstrate the types of analyses that can be performed on proteome-scale collections of sequences, including a search for novel quaternary structures and implications for functional annotation.

Original languageEnglish
Title of host publicationProceedings - 2022 IEEE 36th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2022
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages206-215
Number of pages10
ISBN (Electronic)9781665497473
DOIs
StatePublished - 2022
Event36th IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2022 - Virtual, Online, France
Duration: May 30 2022Jun 3 2022

Publication series

NameProceedings - 2022 IEEE 36th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2022

Conference

Conference36th IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2022
Country/TerritoryFrance
CityVirtual, Online
Period05/30/2206/3/22

Funding

This research was sponsored in part by the Office of Biological and Environmental Research’s Genomic Science program within the US Department of Energy Office of Science, under award number ERKP917, the Laboratory Directed Research and Development Program at Oak Ridge National Laboratory (ORNL), and used resources of the Oak Ridge Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC05-00OR22725, granted in part by the Advanced Scientific Computing Research (ASCR) Leadership Computing Challenge (ALCC) program, resources supported by the Partnership for an Advanced Computing Environment (PACE) at Georgia Tech. We thank Bryan Piatkowski, Jerry Parks and Justin North for genome information. Notice: This manuscript has been authored in part by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan).

FundersFunder number
U.S. Department of Energy
Office of ScienceDE-AC05-00OR22725, ERKP917
Advanced Scientific Computing Research
Biological and Environmental Research
Oak Ridge National Laboratory

    Keywords

    • deep learning
    • high-performance computing
    • protein structure prediction
    • proteomics
    • workflow management software

    Fingerprint

    Dive into the research topics of 'Proteome-scale Deployment of Protein Structure Prediction Workflows on the Summit Supercomputer'. Together they form a unique fingerprint.

    Cite this