Rxivist logo

Rxivist combines preprints from bioRxiv with data from Twitter to help you find the papers being discussed in your field. Currently indexing 57,789 bioRxiv papers from 265,997 authors.

Computational proteogenomic identification and functional interpretation of translated fusions and micro structural variations in cancer

By Yen Yi Lin, Alexander Gawronski, Faraz Hach, Sujun Li, Ibrahim Numanagić, Iman Sarrafi, Swati Mishra, Andrew McPherson, Colin Collins, Milan Radovich, Haixu Tang, S. Cenk Sahinalp

Posted 25 Jul 2017
bioRxiv DOI: 10.1101/168377 (published DOI: 10.1093/bioinformatics/btx807)

Rapid advancement in high throughput genome and transcriptome sequencing (HTS) and mass spectrometry (MS) technologies has enabled the acquisition of the genomic, transcriptomic and proteomic data from the same tissue sample.In this paper we introduce a novel computational framework which can integratively analyze all three types of omics data to obtain a complete molecular profile of a tissue sample, in normal and disease conditions.Our framework includes MiStrVar, an algorithmic method we developed to identify micro structural variants (microSVs) on genomic HTS data. Coupled with deFuse, a popular gene fusion detection method we developed earlier, MiStrVar can provide an accurate profile of structurally aberrant transcripts in cancer samples.Given the breakpoints obtained by MiStrVar and deFuse, our framework can then identify all relevant peptides that span the breakpoint junctions and match them with unique proteomic signatures in the respective proteomics data sets. Our framework's ability to observe structural aberrations at three levels of omics data provides means of validating their presence. We have applied our framework to all The Cancer Genome Atlas (TCGA) breast cancer Whole Genome Sequencing (WGS) and/or RNA-Seq data sets, spanning all four major subtypes, for which proteomics data from Clinical Proteomic Tumor Analysis Consortium (CPTAC) have been released.A recent study on this dataset focusing on SNVs has reported many that lead to novel peptides.Complementing and significantly broadening this study, we detected 244 novel peptides from 432 candidate genomic or transcriptomic sequence aberrations. Many of the fusions and microSVs we discovered have not been reported in the literature.Interestingly, the vast majority of these translated aberrations (in particular, fusions) were private, demonstrating the extensive inter-genomic heterogeneity present in breast cancer.Many of these aberrations also have matching out-of-frame downstream peptides, potentially indicating novel protein sequence and structure. Moreover, the most significantly enriched genes involved in translated fusions are cancer-related.Furthermore a number of the somatic, translated microSVs are observed in tumor suppressor genes.

Download data

  • Downloaded 372 times
  • Download rankings, all-time:
    • Site-wide: 24,504 out of 57,789
    • In bioinformatics: 3,400 out of 5,897
  • Year to date:
    • Site-wide: 33,815 out of 57,789
  • Since beginning of last month:
    • Site-wide: 24,917 out of 57,789

Altmetric data


Downloads over time

Distribution of downloads per paper, site-wide


Sign up for the Rxivist weekly newsletter! (Click here for more details.)


News