Identification of Breast Cancer subtype specific MicroRNAs using Survival Analysis
to find their role in transcriptomic regulation


Michal Denkiewicz1,2,+,Indrajit Saha3,+,*, Somnath Rakshit1,3, Jnanendra Prasad Sarkar4,5, and Dariusz Plewczynski1,6,*


2 1Laboratory of Functional and Structural Genomics, Center of New Technologies,University of Warsaw, Warsaw, Poland
2College of Inter-Faculty Individual Studies in Mathematics and Natural Sciences,University of Warsaw, Warsaw, Poland
3Department of Computer Science and Engineering, National Institute of Technical Teachers' Training and Research, Kolkata, India
4Larsen & Toubro Infotech Ltd., Pune, India
5Department of Computer Science & Engineering, Jadavpur University, Kolkata, India
6Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland
*Correspondence should be addressed to indrajit@nitttrkol.ac.in and dariuszplewczynski@cent.uw.edu.pl
+These authors contributed equally to this work



ABSTRACT

The microRNA (miRNA) biomolecules have a significant role in the development of breast cancer, and their expression profile is different in each subtype of breast cancer. Thus, our goal is to use the Next Generation Sequencing provided high-throughput miRNA expression and clinical data in an integrated fashion to perform survival analysis in order to identify breast cancer subtype specific miRNAs, and analyze associated genes and transcription factors. We select top 100 miRNAs for each of the four subtypes, based on the value of hazard ratio and p-value, thereafter, identify 44 miRNAs that are related to all four subtypes, which we call as 4-star miRNAs. Moreover, 12, 14, 9 and 15 subtype specific, viz. 1-star miRNAs, are also identified. The resulting miRNAs are validated by using machine learning methods to differentiate tumor cases from controls (for 4-star miRNAs), and subtypes (for 1-star miRNAs). The 4-star miRNAs provide 95% average accuracy, while in case of 1-star miRNAs 81% accuracy is achieved for HER2-Enriched. Differences in expression of miRNAs between cancer stages is also analyzed, and a subset of 8 miRNAs is found, for which expression is increased in stage II relative to stage I, including hsa-miR-10b-5p, which contributes to breast cancer metastasis. Subsequently we prepare regulatory networks in order to identify the interactions among miRNAs, their targeted genes and transcription factors (TFs), that are targeting those miRNAs. In this way, key regulatory circuits are identified, where genes such as TP53, ESR1, BRCA1, MYC and others, that are known to be important genetic factors for the cause of breast cancer, produce transcription factors that target the same genes as well as interact with the selected miRNAs. To provide further biological validation the Protein-Protein Interaction (PPI) networks are prepared and KEGG pathway and GO enrichment analysis are performed. Among the enriched pathways many are breast cancer-related, such as PI3K-Akt or p53 signaling pathways, and contain proteins such as TP53, also present in the regulatory networks. Moreover, we find that the genes are enriched in GO terms associated with breast cancer. Our results provide detailed analysis of selected miRNAs and their regulatory networks.

Supplementary


datasets


      LA: Luminal A       LB: Luminal B                HER2       BL: Basal-like       Control

code


The algorithm is implemented in MATLAB. The code is available in zipped form here. Use of algorithm is free as long as it is used for any academic and non-commercial purpose. If you use these algorithms, please cite the following reference:

M. Denkiewicz, I. Saha, S. Rakshit, J. P. Sarkar and D. Plewczynski, "Identification of Breast Cancer subtype specific MicroRNAs using Survival Analysis to find their role in transcriptomic regulation", submitted to Frontiers in Genetics (2019).

For any query regarding the algorithms, please mail to indrajit@nitttrkol.ac.in