Topological Analysis for Sequence Variability: Case Study on more than 2K SARS-CoV-2 sequences of
COVID-19 infected 54 countries in comparison with SARS-CoV-1 and MERS-CoV


Jnanendra Prasad Sarkar1,2,+, Indrajit Saha3,+,*, Arijit Seal4, Debasree Maity5, Ujjwal Maulik2


1Larsen & Toubro Infotech Ltd., Pune, India
2Department of Computer Science and Engineering, Jadavpur University, Kolkata, India
3Department of Computer Science and Engineering, National Institute of Technical Teachers' Training and Research, Kolkata, India
4Cognizant Technology Solutions Pvt.Ltd, Kolkata, India
5MCKV Institute of Engineering, Liluah, Howrah, India
*Correspondence should be addressed to team leader : indrajit@nitttrkol.ac.in
+These team members contributed equally to this work



ABSTRACT

The pandemic due to novel coronavirus, SARS-CoV-2 is a serious global concern now. More than thousand new COVID-19 infections are getting reported daily for this virus across the globe. Thus, the medical research communities are trying to find the remedy to restrict the spreading of this virus, while the vaccine development work is still under research in parallel. In such critical situation, not only the medical research community, but also the scientists in different fields like microbiology, pharmacy, bioinformatics and data science are also sharing effort to accelerate the process of vaccine development, virus prediction, forecasting the transmissible probability and reproduction cases of virus for social awareness. With the similar context, in this article, we have studied sequence variability of the virus primarily focusing on three aspects: (a) sequence variability among SARS-CoV-1, MERS-CoV and SARS-CoV-2 in human host, which are in the same coronavirus family, (b) sequence variability of SARS-CoV-2 in human host for 54 different countries and (c) sequence variability between coronavirus family and country specific SARS-CoV-2 sequences in human host. For this purpose, as a case study, we have performed topological analysis of 2391 global genomic sequences of SARS-CoV-2 in association with SARS-CoV-1 and MERS-CoV using an integrated semi-alignment based computational technique. The results of the semi-alignment based technique are experimentally and statistically found similar to alignment based technique and computationally faster. Moreover, the outcome of this analysis can help to identify the nations with homogeneous SARS-CoV-2 sequences, so that same vaccine can be applied to their heterogeneous human population.

Supplementary


datasets


               SARS-CoV-1         ;MERS-CoV       SARS-CoV-2

code


The code/technique/algorithm is developed in MATLAB. The code is available in zipped form here. Use of code/technique/algorithm is free as long as it is used for any academic and non-commercial purpose. If you use this code/technique/algorithm, please cite this work.

For any query regarding the code/technique/algorithm, please mail to indrajit@nitttrkol.ac.in

Disclaimer:
The virus genomes are from public databases like NCBI and GISAID. Thus, NITTTR, Kolkata does not own any responsible regarding this.