A New Evolutionary Rough Fuzzy Integrated Machine Learning Technique for microRNA selection
using Next-Generation Sequencing data of Breast Cancer


Jnanendra Prasad Sarkar1,+, Indrajit Saha2,+,*, Somnath Rakshit3,+, Monalisa Pal4,
Michal Wlasnowolski3,5, Anasua Sarkar6, Ujjwal Maulik6, and Dariusz Plewczynski3,5


1Larsen & Toubro Infotech Ltd.
2Department of Computer Science and Engineering, National Institute of Technical Teachers' Training and Research, Kolkata, India
3Centre of New Technologies, University of Warsaw, Poland
4Machine Intelligence Unit, Indian Statistical Institute
5Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland
6Department of Computer Science and Engineering, Jadavpur University, Kolkata, India
*Correspondence should be addressed to indrajit@nitttrkol.ac.in
+These authors contributed equally to this work



ABSTRACT

MicroRNAs (miRNA) play an important role in various biological process by regulating gene expression. Their abnormal expression may lead to cancer. Therefore, analysis of such data may discover potential biological insight for cancer diagnosis. In this regard, recently many feature selection methods have been developed to identify such miRNAs. These methods have their own merits and demerits as the task is very challenging in nature. Thus, in this article, we propose a novel wrapper based feature selection technique with the integration of Rough and Fuzzy sets, Random Forest and Particle Swarm Optimization, to identify putative miRNAs that can solve the underlying biological problem effectively, i.e. to separate tumour and control samples. Here, Rough and Fuzzy sets help to address the vagueness and overlapping characteristics of the dataset while performing clustering. On the other hand, Random Forest is applied to perform the classification task on the clustering results to yield better solutions. The integrated clustering and classification tasks are considered as an underlying optimization problem for Particle Swarm Optimization method where particles encode features, in this case, miRNAs. The performance of the proposed wrapper based method has been demonstrated quantitatively and visually on next-generation sequencing data of breast cancer from The Cancer Genome Atlas (TCGA). Finally, the selected miRNAs are validated through biological significance tests.

datasets


code


The algorithm is implemented in Python. The code is available in zipped form here. Use of algorithm is free as long as it is used for any academic and non-commercial purpose. If you use these algorithms, please cite the following reference:

J. P. Sarkar, I. Saha, S. Rakshit, M. Pal, M. Wlasnowolski, A. Sarkar, U. Maulik and D. Plewczynski, "A New Evolutionary Rough Fuzzy Integrated Machine Learning Technique for microRNA selection using Next-Generation Sequencing data of Breast Cancer", accepted in The Genetic and Evolutionary Computation Conference, Prague, Czech Republic (2019).

For any query regarding the algorithms, please mail to indrajit@nitttrkol.ac.in