Data-Driven Whole-Genome Clustering to Detect Geospatial, Temporal, and Functional Trends in SARS-CoV-2 Evolution

Jean Merlet, John Lagergren, Verónica Melesse Vergara, Mikaela Cashman, Christopher Bradburne, Raina Plowright, Emily Gurley, Wayne Joubert, Daniel Jacobson

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Current methods for defining SARS-CoV-2 lineages ignore the vast majority of the SARS-CoV-2 genome. We develop and apply an exhaustive vector comparison method that directly compares all known SARS-CoV-2 genome sequences to produce novel lineage classifications. We utilize data-driven models that (i) accurately capture the complex interactions across the set of all known SARS-CoV-2 genomes, (ii) scale to leadership-class computing systems, and (iii) enable tracking how such strains evolve geospatially over time. We show that during the height of the original Omicron surge, countries across Europe, Asia, and the Americas had a spatially asynchronous distribution of Omicron sub-strains. Moreover, neighboring countries were often dominated by either different clusters of the same variant or different variants altogether throughout the pandemic. Analyses of this kind may suggest a different pattern of epidemiological risk than was understood from conventional data, as well as produce actionable insights and transform our ability to prepare for and respond to current and future biological threats.

Original languageEnglish
Title of host publicationProceedings of the Platform for Advanced Scientific Computing Conference, PASC 2023
PublisherAssociation for Computing Machinery, Inc
ISBN (Electronic)9798400701900
DOIs
StatePublished - Jun 26 2023
Event2023 Platform for Advanced Scientific Computing Conference, PASC 2023 - Davos, Switzerland
Duration: Jun 26 2023Jun 28 2023

Publication series

NameProceedings of the Platform for Advanced Scientific Computing Conference, PASC 2023

Conference

Conference2023 Platform for Advanced Scientific Computing Conference, PASC 2023
Country/TerritorySwitzerland
CityDavos
Period06/26/2306/28/23

Funding

This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. This work is supported as part of the Genomic Sciences Program DOE Systems Biology Knowledgebase (KBase) funded by the Office of Biological and Environmental Research's Genomic Science program within the US Department of Energy Office of Science under Award Numbers DE-AC02-05CH11231, DE-AC02-06CH11357, DE-AC05-00OR22725, and DE-AC02-98CH10886. This work was also supported by the U.S. National Science Foundation (EF-2133763). We gratefully acknowledge all data contributors, i.e., the Authors and their Originating laboratories responsible for obtaining the specimens, and their Submitting laboratories for generating the genetic sequence and metadata and sharing via the GISAID Initiative, on which this research is based. This manuscript has been co-authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan). This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. This work is supported as part of the Genomic Sciences Program DOE Systems Biology Knowledgebase (KBase) funded by the Office of Biological and Environmental Research’s Genomic Science program within the US Department of Energy Office of Science under Award Numbers DE-AC02-05CH11231, DE-AC02-06CH11357, DE-AC05–00OR22725, and DE-AC02-98CH10886. This work was also supported by the U.S. National Science Foundation (EF-2133763). We gratefully acknowledge all data contributors, i.e., the Authors and their Originating laboratories responsible for obtaining the specimens, and their Submitting laboratories for generating the genetic sequence and metadata and sharing via the GISAID Initiative, on which this research is based. This manuscript has been co-authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan).

FundersFunder number
DOE Public Access Plan
United States Government
National Science FoundationEF-2133763
U.S. Department of Energy
Office of ScienceDE-AC02-05CH11231, DE-AC02-98CH10886, DE-AC02-06CH11357, DE-AC05–00OR22725
Biological and Environmental Research
UT-Battelle

    Keywords

    • SARS-CoV-2
    • biological networks
    • high performance computing

    Fingerprint

    Dive into the research topics of 'Data-Driven Whole-Genome Clustering to Detect Geospatial, Temporal, and Functional Trends in SARS-CoV-2 Evolution'. Together they form a unique fingerprint.

    Cite this