miRNA-Prediction-V1

Genome-wide analysis of NGS data to compile cancer-specific panels of miRNA biomarkers

Shib Sankar Bhowmick^1,2,+, Indrajit Saha^3,+,*, Debotosh Bhattacharjee¹, L. M. Genovese⁴, and Filippo Geraci^4,+

¹Department of Computer Science and Engineering, Jadavpur University, Kolkata, India
²Department of Electronics & Communication Engineering, Heritage Institute of Technology, Kolkata, India
³Department of Computer Science and Engineering, National Institute of Technical Teachers' Training & Research, Kolkata, India
⁴Institute for Informatics and telematics, National Research Council, Pisa, Italy
*Correspondence should be addressed to indrajit@nitttrkol.ac.in
⁺These authors contributed equally to this work

ABSTRACT

MicroRNAs are small non-coding RNAs that influence gene expression by binding to the 3' UTR of target mRNAs in order to repress protein synthesis. Soon after discovery, microRNA dysregulation has been associated to several pathologies. In particular, they have often been reported as differentially expressed in healthy and tumor samples. This fact suggested that microRNAs are likely to be good candidate biomarkers for cancer diagnosis and personalized medicine. With the advent of Next-Generation Sequencing (NGS), measuring the expression level of the whole miRNAome at once is now routine. Yet, the collaborative effort of sharing data opens to the possibility of population analyses. This context motivated us to perform an in-silico study to distill cancer-specific panels of microRNAs that can serve as biomarkers. We observed that the problem of finding biomarkers can be modeled as a two-class classification task where, given the miRNAomes of a population of healthy and cancerous samples, we want to find the subset of microRNAs that leads to the highest classification accuracy. We fulfill this task leveraging on a sensible combination of data mining tools. In particular, we used: differential evolution for candidate selection, component analysis to preserve the relationships among miRNAs, and SVM for sample classification. We identified 10 cancer-specific panels whose classification accuracy is always higher than 92%. These panels have a very little overlap suggesting that miRNAs are not only predictive of the onset of cancer, but can be used for classification purposes as well. We experimentally validated the contribution of each of the employed tools to the selection of discriminating miRNAs. Moreover, we tested the significance of each panel for the corresponding cancer type. In particular, enrichment analysis showed that the selected miRNAs are involved in oncogenesis pathways, while survival analysis proved that miRNAs can be used to evaluate cancer severity. Summarizing: results demonstrated that our method is able to produce cancer-specific panels that are promising candidates for a subsequent in vitro validation.

code

All algorithms are implemented in MATLAB and executed serially. The code is available in zipped form here. For the instructions to the algorithms, users are requested to first read the file Readme.txt included in the zip file.

Use of algorithms is free as long as it is used for any academic and non-commercial purpose. If you use these algorithms, please cite the following reference:

S. S. Bhowmick, I. Saha, D. Bhattacharjee, L. M. Genovese and F. Geraci, "Genome-wide analysis of NGS data to compile cancer-specific panels of miRNA biomarkers", Plos One, Vol. 13, No. 7, e0200353, 2018.

For any query regarding the algorithms, please mail to indrajit@nitttrkol.ac.in

BRCA: Breast invasive carcinoma	KIRC: Kidney renal clear cell carcinoma	LGG: Brain lower grade glioma
LIHC: Liver hepatocellular carcinoma	LUAD: Lung adenocarcinoma	PAAD: Pancreatic adenocarcinoma
PRAD: Prostate adenocarcinoma	SKCM: Skin cutaneous melanoma	STAD: Stomach adenocarcinoma
THCA: Thyroid carcinoma	All Normal

Supplementary

datasets

code