Abstract
CRISPR-Cas9 tools have transformed genetic manipulation capabilities in the laboratory. Empirical rules-of-thumb have been developed for only a narrow range of model organisms, and mechanistic underpinnings for sgRNA efficiency remain poorly understood. This work establishes a novel feature set and new public resource, produced with quantum chemical tensors, for interpreting and predicting sgRNA efficiency. Feature engineering for sgRNA efficiency is performed using an explainable-artificial intelligence model: iterative Random Forest (iRF). By encoding quantitative attributes of position-specific sequences for Escherichia coli sgRNAs, we identify important traits for sgRNA design in bacterial species. Additionally, we show that expanding positional encoding to quantum descriptors of base-pair, dimer, trimer, and tetramer sequences captures intricate interactions in local and neighboring nucleotides of the target DNA. These features highlight variation in CRISPR-Cas9 sgRNA dynamics between E. coli and H. sapiens genomes. These novel encodings of sgRNAs enhance our understanding of the elaborate quantum biological processes involved in CRISPR-Cas9 machinery.
Original language | English |
---|---|
Pages (from-to) | 10147-10161 |
Number of pages | 15 |
Journal | Nucleic Acids Research |
Volume | 51 |
Issue number | 19 |
DOIs | |
State | Published - Oct 27 2023 |
Funding
Secure Ecosystem & Engineering Design Science Focus Area is sponsored by the Genomic Science Program, U.S Department of Energy, Office of Science, Biological and Environmental Research [FWP ERKPA17]; Center for Bioenergy Innovation, a DOE Bioenergy Research Center supported by the Office of Biological and Environmental Research in the DOE Office of Science; Oak Ridge National Laboratory is managed by UT-Battelle, LLC for the U.S. Department of Energy [DE-AC05-00OR45678]; U.S. Department of Energy, Office of Science, through the Genomic Science Program, Office of Biological and Environmental Research [FWP ERKP123]; Laboratory Directed Research and Development Program of Oak Ridge National Laboratory, managed by UT-Battelle, LLC, for the U. S. Department of Energy; Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy [DE-AC05-00OR22725]; this research used resources of the Compute and Data Environment for Science (CADES) at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy [DE-AC05-00OR22725]. Funding for open access charge: Secure Ecosystem & Engineering Design Science Focus Area is sponsored by the Genomic Science Program, U.S. Department of Energy, Office of Science, Biological and Environmental Research [FWP ERKPA17]. This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan ( http://energy.gov/downloads/doe-public-access-plan ).
Funders | Funder number |
---|---|
CADES | |
DOE Bioenergy Research Center | |
Data Environment for Science | |
Laboratory Directed Research | DE-AC05-00OR22725 |
U.S. Department of Energy | DE-AC05-00OR45678, FWP ERKP123 |
Office of Science | |
Biological and Environmental Research | FWP ERKPA17 |
Oak Ridge National Laboratory | |
Center for Bioenergy Innovation |