about project

In the early days of COVID-19 pandemic in India, CRG Short-term Special call on COVID-19 was launched by the Science and Engineering Research Board (SERB), a statutory body under the Department of Science and Technology (DST), Govt. of India. In response to this, the project called "In Silico Analysis of 10000 Genomic Sequences of COVID-19 around the World including India to Identify Genetic Variability and potential Molecular Targets in Virus and Human" (project No.: CVD/2020/000991) was approved for a year (July-2020 to 2021). The primary objectives of this project were to (a) identify the genetic variability in SARS-CoV-2 genomes around the globe including India, (b) identify the number of virus strains using Single Nucleotide Polymorphism (SNP) data, (c) identify the putative Epitopes as candidates of synthetic vaccine based on genomic conserved regions that is highly immunogenic and antigenic, (d) identify the potential target proteins of the virus and human host based on Protein-Protein Interactions as well as by integrating the knowledge of genetic variability. In addition to these, other objectives like (e) prediction of Coronavirus from other pathogenic viruses using machine learning, and (f) identification of virus miRNAs that are also involved in regulating human mRNA or vice-versa were also considered to explore the challenges of COVID-19 from multiple directions in order to give a best possible answer to combat the spread of SARS-CoV-2.


Whole genome analysis of more than 10000 SARS-CoV-2 virus unveils global genetic diversity and target region of NSP6

Whole genome analysis of SARS-CoV-2 is important to identify its genetic diversity. Moreover, accurate detection of SARS-CoV-2 is required for its correct diagnosis. To address these, first we have analysed publicly available 10 664 complete or near-complete SARS-CoV-2 genomes of 73 countries globally to find mutation points in the coding regions as substitution, deletion, insertion and single nucleotide polymorphism (SNP) globally and country wise. In this regard, multiple sequence alignment is performed in the presence of reference sequence from NCBI. Once the alignment is done, a consensus sequence is build to analyse each genomic sequence to identify the unique mutation points as substitutions, deletions, insertions and SNPs globally, thereby resulting in 7209, 11700, 119 and 53 such mutation points respectively. [Read More]

Genome-wide analysis of 10664 SARS-CoV-2 genomes to identify virus strains in 73 countries based on single nucleotide polymorphism

Since the onslaught of SARS-CoV-2, the research community has been searching for a vaccine to fight against this virus. However, during this period, the virus has mutated to adapt to the different environmental conditions in the world and made the task of vaccine design more challenging. In this situation, the identification of virus strains is very much timely and important task. We have performed genome-wide analysis of 10664 SARS-CoV-2 genomes of 73 countries to identify and prepare a Single Nucleotide Polymorphism (SNP) dataset of SARS-CoV-2. Thereafter, with the use of this SNP data, the advantage of hierarchical clustering is taken care of in such a way so that Average Linkage and Complete Linkage with Jaccard and Hamming distance functions are applied separately in order to identify the virus strains as clusters present in the SNP data. [Read More]

Immunogenicity and antigenicity based T-cell and B-cell epitopes identification from conserved regions of 10664 SARS-CoV-2 genomes

The surge of SARS-CoV-2 has created a wave of pandemic around the globe due to its high transmission rate. To contain this virus, researchers are working around the clock for a solution in the form of vaccine. Due to the impact of this pandemic, the economy and healthcare have immensely suffered around the globe. Thus, an efficient vaccine design is the need of the hour. Moreover, to have a generalised vaccine for heterogeneous human population, the virus genomes from different countries should be considered. Thus, in this work, we have performed genome-wide analysis of 10,664 SARS-CoV-2 genomes of 73 countries around the globe in order to identify the potential conserved regions for the development of peptide based synthetic vaccine viz. epitopes with high immunogenic and antigenic scores. In this regard, multiple sequence alignment technique viz. Clustal Omega is used to align the 10,664 SARS-CoV-2 virus genomes. [Read More]

COVID-DeepPredictor: Recurrent Neural Network to Predict SARS-CoV-2 and Other Pathogenic Viruses

The COVID-19 disease for Novel coronavirus (SARS-CoV-2) has turned out to be a global pandemic. The high transmission rate of this pathogenic virus demands an early prediction and proper identification for the subsequent treatment. However, polymorphic nature of this virus allows it to adapt and sustain in different kinds of environment which makes it difficult to predict. On the other hand, there are other pathogens like SARS-CoV-1, MERS-CoV, Ebola, Dengue, and Influenza as well, so that a predictor is highly required to distinguish them with the use of their genomic information. To mitigate this problem, in this work COVID-DeepPredictor is proposed on the framework of deep learning to identify an unknown sequence of these pathogens. COVID-DeepPredictor uses Long Short Term Memory as Recurrent Neural Network for the underlying prediction with an alignment-free technique. [Read More]




Dr. Indrajit Saha

Principal Investigator


Dr. Nimisha Ghosh

Project Colleborator


Nikhil Sharma

Project Intern


Jnanendra Prasad Sarkar

Ph.D Student


Suman Nandi

Junior Research Fellow


Debasree Maity

Project Intern