TY - GEN
T1 - Data-Driven Whole-Genome Clustering to Detect Geospatial, Temporal, and Functional Trends in SARS-CoV-2 Evolution
AU - Merlet, Jean
AU - Lagergren, John
AU - Melesse Vergara, Verónica
AU - Cashman, Mikaela
AU - Bradburne, Christopher
AU - Plowright, Raina
AU - Gurley, Emily
AU - Joubert, Wayne
AU - Jacobson, Daniel
N1 - Publisher Copyright:
© 2023 ACM.
PY - 2023/6/26
Y1 - 2023/6/26
N2 - Current methods for defining SARS-CoV-2 lineages ignore the vast majority of the SARS-CoV-2 genome. We develop and apply an exhaustive vector comparison method that directly compares all known SARS-CoV-2 genome sequences to produce novel lineage classifications. We utilize data-driven models that (i) accurately capture the complex interactions across the set of all known SARS-CoV-2 genomes, (ii) scale to leadership-class computing systems, and (iii) enable tracking how such strains evolve geospatially over time. We show that during the height of the original Omicron surge, countries across Europe, Asia, and the Americas had a spatially asynchronous distribution of Omicron sub-strains. Moreover, neighboring countries were often dominated by either different clusters of the same variant or different variants altogether throughout the pandemic. Analyses of this kind may suggest a different pattern of epidemiological risk than was understood from conventional data, as well as produce actionable insights and transform our ability to prepare for and respond to current and future biological threats.
AB - Current methods for defining SARS-CoV-2 lineages ignore the vast majority of the SARS-CoV-2 genome. We develop and apply an exhaustive vector comparison method that directly compares all known SARS-CoV-2 genome sequences to produce novel lineage classifications. We utilize data-driven models that (i) accurately capture the complex interactions across the set of all known SARS-CoV-2 genomes, (ii) scale to leadership-class computing systems, and (iii) enable tracking how such strains evolve geospatially over time. We show that during the height of the original Omicron surge, countries across Europe, Asia, and the Americas had a spatially asynchronous distribution of Omicron sub-strains. Moreover, neighboring countries were often dominated by either different clusters of the same variant or different variants altogether throughout the pandemic. Analyses of this kind may suggest a different pattern of epidemiological risk than was understood from conventional data, as well as produce actionable insights and transform our ability to prepare for and respond to current and future biological threats.
KW - SARS-CoV-2
KW - biological networks
KW - high performance computing
UR - http://www.scopus.com/inward/record.url?scp=85166239727&partnerID=8YFLogxK
U2 - 10.1145/3592979.3593425
DO - 10.1145/3592979.3593425
M3 - Conference contribution
AN - SCOPUS:85166239727
T3 - Proceedings of the Platform for Advanced Scientific Computing Conference, PASC 2023
BT - Proceedings of the Platform for Advanced Scientific Computing Conference, PASC 2023
PB - Association for Computing Machinery, Inc
T2 - 2023 Platform for Advanced Scientific Computing Conference, PASC 2023
Y2 - 26 June 2023 through 28 June 2023
ER -