Quantum biological insights into CRISPR-Cas9 sgRNA efficiency from explainable-AI driven feature engineering

Jaclyn M. Noshay, Tyler Walker, William G. Alexander, Dawn M. Klingeman, Jonathon Romero, Angelica M. Walker, Erica Prates, Carrie Eckert, Stephan Irle, David Kainer, Daniel A. Jacobson

Research output: Contribution to journalArticlepeer-review

2 Scopus citations

Abstract

CRISPR-Cas9 tools have transformed genetic manipulation capabilities in the laboratory. Empirical rules-of-thumb have been developed for only a narrow range of model organisms, and mechanistic underpinnings for sgRNA efficiency remain poorly understood. This work establishes a novel feature set and new public resource, produced with quantum chemical tensors, for interpreting and predicting sgRNA efficiency. Feature engineering for sgRNA efficiency is performed using an explainable-artificial intelligence model: iterative Random Forest (iRF). By encoding quantitative attributes of position-specific sequences for Escherichia coli sgRNAs, we identify important traits for sgRNA design in bacterial species. Additionally, we show that expanding positional encoding to quantum descriptors of base-pair, dimer, trimer, and tetramer sequences captures intricate interactions in local and neighboring nucleotides of the target DNA. These features highlight variation in CRISPR-Cas9 sgRNA dynamics between E. coli and H. sapiens genomes. These novel encodings of sgRNAs enhance our understanding of the elaborate quantum biological processes involved in CRISPR-Cas9 machinery.

Original languageEnglish
Pages (from-to)10147-10161
Number of pages15
JournalNucleic Acids Research
Volume51
Issue number19
DOIs
StatePublished - Oct 27 2023

Funding

Secure Ecosystem & Engineering Design Science Focus Area is sponsored by the Genomic Science Program, U.S Department of Energy, Office of Science, Biological and Environmental Research [FWP ERKPA17]; Center for Bioenergy Innovation, a DOE Bioenergy Research Center supported by the Office of Biological and Environmental Research in the DOE Office of Science; Oak Ridge National Laboratory is managed by UT-Battelle, LLC for the U.S. Department of Energy [DE-AC05-00OR45678]; U.S. Department of Energy, Office of Science, through the Genomic Science Program, Office of Biological and Environmental Research [FWP ERKP123]; Laboratory Directed Research and Development Program of Oak Ridge National Laboratory, managed by UT-Battelle, LLC, for the U. S. Department of Energy; Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy [DE-AC05-00OR22725]; this research used resources of the Compute and Data Environment for Science (CADES) at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy [DE-AC05-00OR22725]. Funding for open access charge: Secure Ecosystem & Engineering Design Science Focus Area is sponsored by the Genomic Science Program, U.S. Department of Energy, Office of Science, Biological and Environmental Research [FWP ERKPA17]. This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan ( http://energy.gov/downloads/doe-public-access-plan ).

FundersFunder number
CADES
DOE Bioenergy Research Center
Data Environment for Science
Laboratory Directed ResearchDE-AC05-00OR22725
U.S. Department of EnergyDE-AC05-00OR45678, FWP ERKP123
Office of Science
Biological and Environmental ResearchFWP ERKPA17
Oak Ridge National Laboratory
Center for Bioenergy Innovation

    Fingerprint

    Dive into the research topics of 'Quantum biological insights into CRISPR-Cas9 sgRNA efficiency from explainable-AI driven feature engineering'. Together they form a unique fingerprint.

    Cite this