Rxivist logo

Optimized data representation and convolutional neural network model for predicting tumor purity

By Gerald J. Sun, David F. Jenkins, Pablo E Cingolani, Jonathan R Dry, Zhongwu Lai

Posted 17 Oct 2019
bioRxiv DOI: 10.1101/805135

Here we present a machine learning model, Deep Purity (DePuty) that leverages convolutional neural networks to accurately predict tumor purity from next-generation sequencing data from clinical samples without matched normals. As input, our model utilizes SNP-based copy number and minor allele frequency data formulated as a scatterplot image. With a representation matching that used by expert human annotators, we best an existing algorithm using only ~100 manually curated samples. Our simple, data-efficient approach can serve as a straightforward alternative to traditional, more complex statistical methods, for building performant purity prediction models that enable downstream bioinformatic analysis of tumor variants and absolute copy number alterations relevant to cancer genomics.

Download data

  • Downloaded 343 times
  • Download rankings, all-time:
    • Site-wide: 82,155
    • In bioinformatics: 7,458
  • Year to date:
    • Site-wide: 95,627
  • Since beginning of last month:
    • Site-wide: 82,403

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide