Rxivist logo

A transferrable and interpretable multiple instance learning model for microsatellite instability prediction based on histopathology images

By Rui Cao, Fan Yang, Si-Cong Ma, Li Liu, Yan Li, De-Hua Wu, Yu Zhao, Tong-Xin Wang, Wei-Jia Lu, Wei-Jing Cai, Hong-Bo Zhu, Xue-Jun Guo, Yu-Wen Lu, Jun-Jie Kuang, Wen-Jing Huan, Wei-Min Tang, Junzhou Huang, Jianhua Yao, Zhong-Yi Dong

Posted 03 Mar 2020
bioRxiv DOI: 10.1101/2020.02.29.971150

Background: Microsatellite instability (MSI) is a negative prognostic factor for colorectal cancer (CRC) and can be used as a predictor of success for immunotherapy in pan-cancer. However, current MSI identification methods are not available for all patients. We propose an ensemble multiple instance learning (MIL)-based deep learning model to predict MSI status directly from histopathology images. Design: Two cohorts of patients were collected, including 429 from The Cancer Genome Atlas (TCGA-COAD) and 785 from a self-collected Asian data set (Asian-CRC). The initial model was developed and validated in TCGA-COAD, and then generalized in Asian-CRC through transfer learning. The pathological signatures extracted from the model are associated with genotypes for model interpretation. Results: A model called Ensembled Patch Likelihood Aggregation (EPLA) was developed in the TCGA-COAD training set based on two consecutive stages: patch-level prediction and WSI-level prediction. The EPLA model achieved an area-under-the -curve (AUC) of 0.8848 in the TCGA-COAD test set, which outperformed the state-of-the-art approach, and an AUC of 0.8504 in the Asian-CRC after transfer learning. Furthermore, the five pathological imaging signatures identified using the model are associated with genomic and transcriptomic profiles, which makes the MIL model interpretable. Results show that our model recognizes pathological signatures related to mutation burden, DNA repair pathways, and immunity. Conclusion: Our MIL-based deep learning model can effectively predict MSI from histopathology images and are transferable to a new patient cohort. The interpretability of our model by association with genomic and transcriptomic biomarkers lays the foundation for prospective clinical research.

Download data

  • Downloaded 654 times
  • Download rankings, all-time:
    • Site-wide: 48,419
    • In bioengineering: 907
  • Year to date:
    • Site-wide: 46,833
  • Since beginning of last month:
    • Site-wide: 67,726

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide