Hotspot Mutations in SARS-CoV-2

Indrajit Saha1,+,*, Nimisha Ghosh2,+, Nikhil Sharma3, Suman Nandi1

1Department of Computer Science and Engineering, National Institute of Technical Teachers' Training and Research, Kolkata, India
2Department of Computer Science and Information Technology, Institute of Technical Education and Research,
Siksha 'O' Anusandhan (Deemed to be University), Bhubaneswar, India
3Department of Electronics and Communication Engineering, Jaypee Institute of Information Technology, Noida, Uttar Pradesh, India
*Correspondence should be addressed to team leader : indrajit@nitttrkol.ac.in
+These team members contributed equally to this work



ABSTRACT

Since its emergence in Wuhan, China, SARS-CoV-2 has spread very rapidly around the world resulting in a global pandemic. Though the vaccination process has started, the number of COVID affected patients is still quite large. Hence, the analysis of hotspot mutations of the different evolving virus strains needs to be carried out. In this regard, multiple sequence alignment of 71038 SARS-CoV-2 genomes of 98 countries over the period from January 2020 to June 2021 is performed using MAFFT followed by phylogenetic analysis. These steps resulted in the identification of hotspot mutations as deletions and substitutions in the coding regions based on entropy less than or equal to 0.3, leading to a total of 45 unique hotspot mutations. Out of these Among these, some important mutations associated with the different SARS-CoV-2 variants of concern (as declared by WHO) like Alpha, Beta, Gamma and Delta include H69-, V70-, V70F, 45 unique hotspot mutations, 39 non-synonymous deletions and substitutions are identified with 9 unique amino acid changes for deletions and 22 unique amino acid changes for substitutions. Y144-, A222V, N501Y, A570D, P681H and P681R. Moreover, 10286 Indian sequences are con- sidered from 71038 global SARS-CoV-2 sequences as a demonstrative example which gives 52 unique hotspot mutations resulting in 45 non-synonymous deletions and substitutions with 5 unique amino acid changes for deletions and 36 unique amino acid changes for substituti- ons. Some important mutations in such sequences pertaining to Delta variant of SARS-CoV-2 are T19R, T95I, G142D, E156-, F157-, L452R, T478K and P681R. Furthermore, the evolution of the hotspot mutations along with the mutations in variants of concern are visualised and their characteristics are also discussed. Finally, for all the non-synonymous substitutions (mis- sense mutations), the functional consequences of amino acid changes in the respective protein structures are calculated using PolyPhen and I-Mutant 2.0. In addition to this, SSIPe is used to report the binding affinity between the receptor binding domain of Spike protein and human ACE2 protein by considering L452R, T478K, E484Q and N501Y hotspot mutations in that region.

Evolution of SARS-CoV-2


      

Evolution of Global 71038 SARS-CoV-2 Genomes

      

Transmission of Global 71038 SARS-CoV-2 Genomes

      

Evolution of Indian 10286 SARS-CoV-2 Genomes

Supplementary


datasets


code


The algorithm is implemented in MATLAB and Python. The code is available on request. Use of code/technique/algorithm is free as long as it is used for any academic and non-commercial purpose. If you use this code/technique/algorithm, please cite this work.

For any query regarding the algorithms, please mail to indrajit@nitttrkol.ac.in

Disclaimer:
The dataset is used from public database like GISAID. Thus, NITTTR, Kolkata does not own any responsible for its accuracy.