Designing single guide RNA for CIRSPR-Cas9 base editor by deep learning

Liangke Gou
MS, 2019
Wu, Yingnian
The CRISPR base editors are programmable DNA editing systems that induce single-nucleotide changes in the DNA using a fusion protein containing a catalytically defective Cas9, a cytidine or adenine deaminase, and an inhibitor of base excision repair. This genome editing approach has the advantage that it does not require the generation of double-stranded DNA breaks or a donor DNA template. The single guide RNAs (sgRNAs) enables the precise editing at the designed region using CRISPR base editor. Different sgRNAs have largely different efficacy for base editing, and computational prediction can facilitate the optimized design of sgRNAs with high editing efficacy, sensitivity and specificity. Here we present a convolutional neural network-based approach to predict the sgRNAs editing efficiency for CRISPR base editing. Firstly, we designed a large-scale sgRNA library of over 7,000 sgRNAs to introduce pre-mature stop codons into essential genes in yeast, where the yeast would drop out from the population if the sgRNA works efficiently due to the disruption of essential genes. The base editing efficiency of the 7,000 sgRNAs was measured by the log ratio change of sgRNA abundance at 72 h after the editing induction and at the beginning. We built a CNN model using the sgRNA sequence and the surrounding DNA context as the training input to predict the editing efficacy of any given sgRNA sequences. With architecture and parameter tuning, the CNN model surpassed the machine learning approaches tested. In addition, the CNN model fully automated the identification of sequence that may affect sgRNA editing efficacy in a data-driven manner.
2019