Rxivist logo

Rxivist combines preprints from bioRxiv with data from Twitter to help you find the papers being discussed in your field. Currently indexing 71,060 bioRxiv papers from 310,082 authors.

Computational proteogenomic identification and functional interpretation of translated fusions and micro structural variations in cancer

By Yen Yi Lin, Alexander Gawronski, Faraz Hach, Sujun Li, Ibrahim Numanagić, Iman Sarrafi, Swati Mishra, Andrew McPherson, Colin Collins, Milan Radovich, Haixu Tang, S. Cenk Sahinalp

Posted 25 Jul 2017
bioRxiv DOI: 10.1101/168377 (published DOI: 10.1093/bioinformatics/btx807)

Rapid advancement in high throughput genome and transcriptome sequencing (HTS) and mass spectrometry (MS) technologies has enabled the acquisition of the genomic, transcriptomic and proteomic data from the same tissue sample.In this paper we introduce a novel computational framework which can integratively analyze all three types of omics data to obtain a complete molecular profile of a tissue sample, in normal and disease conditions.Our framework includes MiStrVar, an algorithmic method we developed to identify micro structural variants (microSVs) on genomic HTS data. Coupled with deFuse, a popular gene fusion detection method we developed earlier, MiStrVar can provide an accurate profile of structurally aberrant transcripts in cancer samples.Given the breakpoints obtained by MiStrVar and deFuse, our framework can then identify all relevant peptides that span the breakpoint junctions and match them with unique proteomic signatures in the respective proteomics data sets. Our framework's ability to observe structural aberrations at three levels of omics data provides means of validating their presence. We have applied our framework to all The Cancer Genome Atlas (TCGA) breast cancer Whole Genome Sequencing (WGS) and/or RNA-Seq data sets, spanning all four major subtypes, for which proteomics data from Clinical Proteomic Tumor Analysis Consortium (CPTAC) have been released.A recent study on this dataset focusing on SNVs has reported many that lead to novel peptides.Complementing and significantly broadening this study, we detected 244 novel peptides from 432 candidate genomic or transcriptomic sequence aberrations. Many of the fusions and microSVs we discovered have not been reported in the literature.Interestingly, the vast majority of these translated aberrations (in particular, fusions) were private, demonstrating the extensive inter-genomic heterogeneity present in breast cancer.Many of these aberrations also have matching out-of-frame downstream peptides, potentially indicating novel protein sequence and structure. Moreover, the most significantly enriched genes involved in translated fusions are cancer-related.Furthermore a number of the somatic, translated microSVs are observed in tumor suppressor genes.

Download data

  • Downloaded 430 times
  • Download rankings, all-time:
    • Site-wide: 27,304 out of 71,063
    • In bioinformatics: 3,667 out of 6,946
  • Year to date:
    • Site-wide: 63,201 out of 71,063
  • Since beginning of last month:
    • Site-wide: 35,938 out of 71,063

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide


Sign up for the Rxivist weekly newsletter! (Click here for more details.)