DIPS-Plus: The enhanced database of interacting protein structures for interface prediction

Alex Morehead, Chen Chen, Ada Sedova, Jianlin Cheng

Research output: Contribution to journalArticlepeer-review

2 Scopus citations

Abstract

In this work, we expand on a dataset recently introduced for protein interface prediction (PIP), the Database of Interacting Protein Structures (DIPS), to present DIPS-Plus, an enhanced, feature-rich dataset of 42,112 complexes for machine learning of protein interfaces. While the original DIPS dataset contains only the Cartesian coordinates for atoms contained in the protein complex along with their types, DIPS-Plus contains multiple residue-level features including surface proximities, half-sphere amino acid compositions, and new profile hidden Markov model (HMM)-based sequence features for each amino acid, providing researchers a curated feature bank for training protein interface prediction methods. We demonstrate through rigorous benchmarks that training an existing state-of-the-art (SOTA) model for PIP on DIPS-Plus yields new SOTA results, surpassing the performance of some of the latest models trained on residue-level and atom-level encodings of protein complexes to date.

Original languageEnglish
Article number509
JournalScientific Data
Volume10
Issue number1
DOIs
StatePublished - Dec 2023

Funding

This project is partially supported by three NSF grants (DBI 2308699, DBI 1759934, and IIS 1763246), one NIH grant (GM093123), three DOE grants (DE-SC0020400, DE-AR0001213, and DE-SC0021303), and the computing allocation on the Andes compute cluster provided by Oak Ridge Leadership Computing Facility (Project ID: BIF132). In particular, this research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.

FundersFunder number
Oak Ridge National LaboratoryBIF132
National Science FoundationDBI 1759934, DBI 2308699, IIS 1763246
National Institutes of HealthGM093123
U.S. Department of EnergyDE-AC05-00OR22725, DE-AR0001213, DE-SC0020400, DE-SC0021303
Office of Science

    Fingerprint

    Dive into the research topics of 'DIPS-Plus: The enhanced database of interacting protein structures for interface prediction'. Together they form a unique fingerprint.

    Cite this