|
This project focuses on enhancing the detection of pathogenic strains directly from metagenomic data, a crucial step in advancing infectious disease diagnostics and surveillance. Unlike traditional methods that rely on culturing or prior isolation, metagenomics enables the analysis of entire microbial communities, including rare or low abundance pathogens. However, accurately identifying disease causing organisms within these complex datasets remains a challenge due to genetic similarity among strains and the sheer diversity of microbial populations. This project addresses these limitations by improving strain level resolution, sensitivity, and specificity, ensuring reliable identification of pathogens in real world samples. |
TransVi: A Transformer-based Web Application for Pathogenic Virus Identification TransVi is a web-based classification model that leverages transformer architectures to predict pathogenic viruses directly from metagenomic sequences. To achieve this, the classification model is first pretrained with DNABERT on genomic sequences, capturing biological context and long range dependencies. The model is then fine-tuned with labelled viral data to enable high accuracy identification of strains such as SARS-CoV-1, MERS, SARS-CoV-2, Ebola, Dengue, and Influenza. TransVi demonstrates the power of large language models in resolving short read ambiguities and offers a scalable solution for real time pathogen surveillance and outbreak response. |
MetaTrans: A Transformer-Integrated Web Application to Improve the Detection of Pathogenic Strains from Metagenomic Data MetaTrans integrates both supervised and unsupervised techniques to improve strain level pathogen detection from metagenomic data. It employs large language models and operates across three phases: initial classification (Model: CLM), clustering of unlabeled data (Model: CLT), and retraining with enriched annotations (Model: CLM*). This iterative pipeline enhances sensitivity and contextual accuracy, enabling precise identification of viruses like SARS-CoV-1, MERS, and SARS-CoV-2. MetaTrans can support intelligent annotation for biomedical research and public health applications. |
|
|
The algorithm is implemented in Python. The code and datasets are available in the following links.
Use of code/technique/algorithm is free as long as it is used for any academic and non-commercial purpose.
If you use this code/technique/algorithm, please cite this work.
For any query regarding the algorithms, please mail to indrajit@nitttrkol.ac.in |
Kathleen Marchal Lead Principal Investigator |
Dr. Indrajit Saha Lead Principal Investigator |
Debi Prasad Mishra Co-Principal Investigator |
Jan Fostier Co-Principal Investigator |
Sigrid De Keersmaecker Co-Principal Investigator |
Priyasi Mallick Junior Research Fellow |