A transferrable and interpretable multiple instance learning model for microsatellite instability prediction based on histopathology images
Posted 03 Mar 2020
bioRxiv DOI: 10.1101/2020.02.29.971150
Posted 03 Mar 2020
Background: Microsatellite instability (MSI) is a negative prognostic factor for colorectal cancer (CRC) and can be used as a predictor of success for immunotherapy in pan-cancer. However, current MSI identification methods are not available for all patients. We propose an ensemble multiple instance learning (MIL)-based deep learning model to predict MSI status directly from histopathology images. Design: Two cohorts of patients were collected, including 429 from The Cancer Genome Atlas (TCGA-COAD) and 785 from a self-collected Asian data set (Asian-CRC). The initial model was developed and validated in TCGA-COAD, and then generalized in Asian-CRC through transfer learning. The pathological signatures extracted from the model are associated with genotypes for model interpretation. Results: A model called Ensembled Patch Likelihood Aggregation (EPLA) was developed in the TCGA-COAD training set based on two consecutive stages: patch-level prediction and WSI-level prediction. The EPLA model achieved an area-under-the -curve (AUC) of 0.8848 in the TCGA-COAD test set, which outperformed the state-of-the-art approach, and an AUC of 0.8504 in the Asian-CRC after transfer learning. Furthermore, the five pathological imaging signatures identified using the model are associated with genomic and transcriptomic profiles, which makes the MIL model interpretable. Results show that our model recognizes pathological signatures related to mutation burden, DNA repair pathways, and immunity. Conclusion: Our MIL-based deep learning model can effectively predict MSI from histopathology images and are transferable to a new patient cohort. The interpretability of our model by association with genomic and transcriptomic biomarkers lays the foundation for prospective clinical research.
- Downloaded 654 times
- Download rankings, all-time:
- Site-wide: 48,419
- In bioengineering: 907
- Year to date:
- Site-wide: 46,833
- Since beginning of last month:
- Site-wide: 67,726
Downloads over time
Distribution of downloads per paper, site-wide
- 27 Nov 2020: The website and API now include results pulled from medRxiv as well as bioRxiv.
- 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!