Abstract
The CRISPR/Cas9 system is a powerful gene-editing tool. Its specificity and stability rely on complex allosteric regulation. Understanding these allosteric regulations is essential for developing high-fidelity Cas9 variants with reduced off-target effects. Here, we used a novel structure-based machine learning (ML) approach to systematically identify long-range allosteric networks in Cas9. Our ML model was trained using all available Cas9 structures, ensuring a comprehensive representation of Cas9’s structural landscape. We then applied this model to Streptococcus pyogenes Cas9 (SpCas9) to demonstrate the feature selection process. Using Cα–Cα inter-residue distances, we mapped key allosteric networks and refined them through a two-stage SHAP feature selection (FS) strategy, reducing a vast feature space to 28 critical Lysine–Arginine (Lys–Arg) residue pairs that mediate SpCas9 interdomain communication, stability, and specificity. These Lys–Arg pairs initially shared a 46.5 Å inter-residue distance, but molecular dynamics simulations revealed distinct stabilization behaviors, indicating a hierarchical allosteric network. Further mutational analysis of R78A-K855A (M1) and R765A–K1246A (M2) identified an “electrostatic valley,” a stabilizing network where positively charged residues interact with negatively charged DNA to maintain SpCas9’s structural integrity. Disrupting this valley through direct (M2) or allosteric (M1) mutations destabilized SpCas9’s DNA-bound conformation, leading to distinct pathways for improving SpCas9 specificity. This study provides a new framework for understanding allostery in Cas9, integrating ML-driven structural analysis with MD simulations. By identifying key allosteric residues and introducing the electrostatic valley as a central concept, we offer a rational strategy for engineering high-fidelity Cas9 variants. Beyond Cas9, our approach can be applied to uncover allosteric hotspots in other enzyme regulations and rational protein design.
| Original language | English |
|---|---|
| Article number | 169538 |
| Journal | Journal of Molecular Biology |
| Volume | 438 |
| Issue number | 2 |
| DOIs | |
| State | Published - Jan 15 2026 |
| Externally published | Yes |
Funding
This work is supported by a grant from the National Institute of General Medical Sciences of the National Institutes of Health (R21GM144860). The datasets of Cas9 and non-Cas9 proteins are given as supporting information in the “Datasets.xls” file. The global ranking for all the bits in the first round FS is given in supporting information as file “Ranks of first-round bits.xls.” Individual SHAP feature importances of all first-round bits in each of the 15 splits and their average feature importances are also given in supporting information as in the file “Individual SHAP values of first-round bits in 15 runs.xls”. The 10,242 lysine-arginine pairs of SpCas9 and the lysine-arginine pairs in the top bits of NmeCas9 and CjCas9 obtained in the first round FS are given in the file “10242 pairs_Sp_Nme_Cj.xls”. The python codes use for pre-processing steps, feature selection and RF modeling is give at https://github.com/Sireesiru/Preprocessing-codes-for-Cas-NonCas-classification. This work is supported by a grant from the National Institute of General Medical Sciences of the National Institutes of Health ( R21GM144860 ).
Keywords
- CRISPR-Cas9
- allosteric network
- classification
- electrostatic valley
- machine learning
Fingerprint
Dive into the research topics of 'Structure-Based Classification of CRISPR/Cas9 Proteins: A Machine Learning Approach to Elucidating Cas9 Allostery'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver