Online Predictor Using Machine Learning to Predict
Novel Coronavirus and Other Pathogenic Viruses
Jnanendra Prasad Sarkar1,2,+, Indrajit Saha3,+,*, Nimisha Ghosh4,5, Debasree Maity6
1Larsen & Toubro Infotech Ltd., Pune, India 2Department of Computer Science and Engineering, Jadavpur University, Kolkata, India 3Department of Computer Science and Engineering, National Institute of Technical Teachers' Training and Research, Kolkata, India 4Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Poland 5Department of Computer Science and Information Technology, Institute of Technical Education and Research, Siksha 'O' Anusandhan (Deemed to be University), Bhubaneswar, Odisha, India 6MCKV Institute of Engineering, Liluah, Howrah, India *Correspondence should be addressed to team leader : indrajit@nitttrkol.ac.in +These team members contributed equally to this work
ABSTRACT
The problem of virus classification is always a subject of concern for virology or epidemiology over the decades. In this regard, machine learning technique can be used to predict the novel coronavirus by considering its sequence. Thus, we are proposing machine learning based novel coronavirus prediction technique, called COVID-Predictor, where 1000 of sequences of SARS-CoV-1, MERS-CoV, SARS-CoV-2 and other viruses are used to train a Naive Bayes classifier so that it can predict any unknown sequences of these viruses. The model has been validated using 10-fold cross validation in comparison with other machine learning techniques. The results show the superiority of our predictor by achieving average 99.3% accuracy on unseen validation set of viruses. The same pre-trained model has been used to design a web based application where sequences of unknown viruses can be uploaded to predict the novel coronavirus.
Please submit SARS-CoV-2 Sequences in CSV format of size < 1MB as shown here
|
The code/technique/algorithm is implemented in Python. The code is available in zipped form here.
Use of code/technique/algorithm is free as long as it is used for any academic and non-commercial purpose. If you use this code/technique/algorithm, please cite this work.
For any query regarding the algorithms, please mail to indrajit@nitttrkol.ac.in |