COVID-Mutation-India

Genome-wide analysis of Indian SARS-CoV-2 Genomes for the
Identification of Genetic Mutation and SNP

Indrajit Saha^1,+,*,Nimisha Ghosh^2,+, Debasree Maity³, Nikhil Sharma ⁴ Jnanendra Prasad Sarkar ^5,6, Kaushik Mitra⁷

¹Department of Computer Science and Engineering, National Institute of Technical Teachers' Training and Research, Kolkata, India
²Department of Computer Science and Information Technology, Institute of Technical Education and Research,
Siksha 'O' Anusandhan (Deemed to be University), Bhubaneswar, India
³MCKV Institute of Engineering, Liluah, Howrah, India
⁴Department of Electronics and Communication Engineering, Jaypee Institute of Information Technology, Noida, India
⁵Larsen & Toubro Infotech, Pune, India
⁶Department of Computer Science and Engineering, Jadavpur University, Kolkata, India
⁷Department of Community Medicine, Burdwan Medical College, Barddhaman, India
*Correspondence should be addressed to team leader : indrajit@nitttrkol.ac.in
⁺These team members contributed equally to this work

ABSTRACT

The wave of COVID-19 is a big threat to the human population. Presently, the world is going through different phases of lock down in order to stop this wave of pandemic; India being no exception. We have also started the lock down on 23rd March, 2020. In this current situation, apart from social distancing only a vaccine can be the proper solution to serve the population of human being. Thus it is important for all the nations to perform the genome-wide analysis in order to identify the genetic variation in Severe Acute Respiratory Syndrome Coronavirus-2 (SARS-CoV-2) so that proper vaccine can be designed. This fast motivated us to analyze publicly available 566 Indian complete or near complete SARS-CoV-2 genomes to find the mutation points as substitution, deletion and insertion. In this regard, we have performed the multiple sequence alignment in presence of reference sequence from NCBI. After the alignment, a consensus sequence is build to analyze each genome in order to identify the mutation points. As a consequence, we have found 933 substitutions, 2449 deletions and 2 insertions, in total 3384 unique mutation points, in 566 genomes across 29.9K bp. Further, it has been classified into three groups as 100 clusters of mutations (mostly deletions), 1609 point mutations as substitution, deletion and insertion and 64 SNPs. These outcomes are visualized using BioCircos and bar plots as well as plotting entropy value of each genomic location. Moreover, phylogenetic analysis has also been performed to see the evolution of SARS-CoV-2 virus in India. It also shows the wide variation in tree which indeed vivid in genomic analysis. Finally, these SNPs can be the useful target for virus classification, designing and defining the effective dose of vaccine for the heterogeneous population.

Supplementary

dataset

code