Palindromic Target Site Identification in SARS-CoV-2, MERS-CoV and
SARS-CoV-1 by Adopting CRISPR-Cas Technique


Nimisha Ghosh1,+, Indrajit Saha2,+,*, Nikhil Sharma 3


1Department of Computer Science and Information Technology, Institute of Technical Education and Research,
Siksha 'O' Anusandhan (Deemed to be University), Bhubaneswar, India
2Department of Computer Science and Engineering, National Institute of Technical Teachers' Training and Research, Kolkata, India
3Department of Electronics and Communication Engineering, Jaypee Institute of Information Technology, Noida, India
*Correspondence should be addressed to team leader : indrajit@nitttrkol.ac.in
+These team members contributed equally to this work



ABSTRACT

Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) associated Cas protein (CRISPR-Cas) has turned out to be a very important tool for the rapid detection of viruses. This can be used for the identification of the target site in a virus by identifying a 3-6 nt length Protospacer Adjacent Motif (PAM) adjacent to the potential target site, thus motivating us to adopt CRISPR-Cas technique to identify SARS-CoV-2 as well as other members of Coronaviridae family. In this regard, we have developed a fast and effective method using \textit{k}-mer technique in order to identify the PAM by scanning the whole genome of the respective virus. Subsequently, palindromic sequences adjacent to the PAM locations are identified as the potential target sites. Palindromes are considered in this work as they are known to identify viruses. Once all the palindrome-PAM combinations are identified, PAMs specific for the RNA-guided DNA Cas9/Cas12 endonuclease are identified to bind and cut the target sites. In this regard, PAMs such as 5'-TGG-3' and 5'-TTTA-3' in NSP3 and Exon for SARS-CoV-2, 5'-GGG-3' and 5'-TGG-3' in Exon and NSP2 for MERS-CoV and 5'-AGG-3' and 5'-TTTG-3' in Helicase and NSP3 respectively for SARS-CoV-1 are identified corresponding to SpCas9 and FnCas12a endonucleases. Finally, to recognise the target sites of Coronaviridae family as cleaved by SpCas9 and FnCas12a, complements of the palindromic target regions are designed as primers or guide RNA (gRNA). Therefore, such complementary gRNAs along with respective Cas proteins can be considered in assays for the identification of SARS-CoV-2, MERS-CoV and SARS-CoV-1.

Supplementary


dataset


code


The algorithm is implemented in MATLAB. The code is available in zipped form here. Use of code/technique/algorithm is free as long as it is used for any academic and non-commercial purpose. If you use this code/technique/algorithm, please cite this work.

For any query regarding the algorithms, please mail to indrajit@nitttrkol.ac.in