Online Predictor Using Machine Learning to Predict
Novel Coronavirus and Other Pathogenic Viruses


Jnanendra Prasad Sarkar1,2,+, Indrajit Saha3,+,*, Nimisha Ghosh4,5, Debasree Maity6


1Larsen & Toubro Infotech Ltd., Pune, India
2Department of Computer Science and Engineering, Jadavpur University, Kolkata, India
3Department of Computer Science and Engineering, National Institute of Technical Teachers' Training and Research, Kolkata, India
4Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Poland
5Department of Computer Science and Information Technology, Institute of Technical Education and Research,
Siksha 'O' Anusandhan (Deemed to be University), Bhubaneswar, Odisha, India
6MCKV Institute of Engineering, Liluah, Howrah, India
*Correspondence should be addressed to team leader : indrajit@nitttrkol.ac.in
+These team members contributed equally to this work



ABSTRACT

The problem of virus classification is always a subject of concern for virology or epidemiology over the decades. In this regard, machine learning technique can be used to predict the novel coronavirus by considering its sequence. Thus, we are proposing machine learning based novel coronavirus prediction technique, called COVID-Predictor, where 1000 of sequences of SARS-CoV-1, MERS-CoV, SARS-CoV-2 and other viruses are used to train a Naive Bayes classifier so that it can predict any unknown sequences of these viruses. The model has been validated using 10-fold cross validation in comparison with other machine learning techniques. The results show the superiority of our predictor by achieving average 99.3% accuracy on unseen validation set of viruses. The same pre-trained model has been used to design a web based application where sequences of unknown viruses can be uploaded to predict the novel coronavirus.

Predictor


Please submit SARS-CoV-2 Sequences in CSV format of size < 1MB as shown here
Output indicates as follows :
1 -> SARS-CoV-1
2 -> MERS-CoV
3 -> SARS-CoV-2
4 ->Other Virus (Dengue and Ebola)




Supplementary


datasets


               SARS-CoV-1         ;MERS-CoV       SARS-CoV-2       Other virus

code


The code/technique/algorithm is implemented in Python. The code is available in zipped form here. Use of code/technique/algorithm is free as long as it is used for any academic and non-commercial purpose. If you use this code/technique/algorithm, please cite this work.

For any query regarding the algorithms, please mail to indrajit@nitttrkol.ac.in

Disclaimer:
The virus genomes are collecetd from public databases like NCBI and GISAID to develop the COVID-Predictor. Thus, NITTTR, Kolkata does not own any responsible for its prediction accuracy.