annoFuse: an R Package to annotate, prioritize, and interactively explore putative oncogenic RNA fusions
Krutika S. Gaonkar,
Komal S. Rathi,
Nicholas A. Chimicles,
Miguel A. Brown,
Ammar S. Naqvi,
Phillip B. Storm,
John M. Maris,
Adam C. Resnick,
Jaclyn N. Taroni,
Jo Lynne Rokita
Posted 12 Nov 2019
bioRxiv DOI: 10.1101/839738
Posted 12 Nov 2019
Background Gene fusion events are a significant source of somatic variation across adult and pediatric cancers and are some of the most clinically-effective therapeutic targets, yet low consensus of RNA-Seq fusion prediction algorithms makes therapeutic prioritization difficult. In addition, events such as polymerase read-throughs, mis-mapping due to gene homology, and fusions occurring in healthy normal tissue require informed filtering, making it difficult for researchers and clinicians to rapidly discern gene fusions that might be true underlying oncogenic drivers of a tumor and in some cases, appropriate targets for therapy. Results We developed annoFuse , an R package, and shinyFuse , a companion web application, to annotate, prioritize, and explore biologically-relevant expressed gene fusions, downstream of fusion calling. We validated annoFuse using a random cohort of TCGA RNA-Seq samples (N = 160) and achieved a 96% sensitivity for retention of high-confidence fusions (N = 603). annoFuse uses FusionAnnotator annotations to filter non-oncogenic and/or artifactual fusions. Then, fusions are prioritized if previously reported in TCGA and/or fusions containing gene partners that are known oncogenes, tumor suppressor genes, COSMIC genes, and/or transcription factors. We applied annoFuse to fusion calls from pediatric brain tumor RNA-Seq samples (N = 1,028) provided as part of the Open Pediatric Brain Tumor Atlas (OpenPBTA) Project to determine recurrent fusions and recurrently-fused genes within different brain tumor histologies. annoFuse annotates protein domains using the PFAM database, assesses reciprocality, and annotates gene partners for kinase domain retention. As a standard function, reportFuse enables generation of a reproducible R Markdown report to summarize filtered fusions, visualize breakpoints and protein domains by transcript, and plot recurrent fusions within cohorts. Finally, we created shinyFuse for algorithm-agnostic interactive exploration and plotting of gene fusions. Conclusions annoFuse provides standardized filtering and annotation for gene fusion calls from STARFusion and Arriba by merging, filtering, and prioritizing putative oncogenic fusions across large cancer datasets, as demonstrated here with data from the OpenPBTA project. We are expanding the package to be widely-applicable to other fusion algorithms and expect annoFuse to provide researchers a method for rapidly evaluating, prioritizing, and translating fusion findings in patient tumors. ### Competing Interest Statement The authors have declared no competing interest. * ALL : Acute Lymphoblastic Leukemia BAM : Binary Alignment Map COSMIC : Catalogue Of Somatic Mutations In Cancer CNS : Central Nervous System DGD_PARALOGS : Duplicated Genes Database annotated paralogs GSEA : Gene Set Enrichment Analysis HGNC_GENEFAM : HGNC annotated gene family FPKM : Fragments Per Kilobase Million OpenPBTA : Open Pediatric Brain Tumor Atlas PI3\_PI4\_kinase : Phosphatidylinositol 3- and 4-kinase Pkinase : Protein kinase domain Pkinase_C : Protein kinase C terminal domain Pkinase_Tyr : Protein tyrosine kinase PPTC : Pediatric Preclinical Testing Consortium RNA : Ribonucleic Acid SAM : Sequence Alignment Map SMC-RNA : Somatic Mutation Calling RNA DREAM Challenge (SMC-RNA) TCGA : The Cancer Genome Atlas TSV : Tab Separated Value TPM : Transcripts Per Kilobase Per Million WHO : World Health Organization
- Downloaded 492 times
- Download rankings, all-time:
- Site-wide: 50,727
- In bioinformatics: 5,256
- Year to date:
- Site-wide: 48,202
- Since beginning of last month:
- Site-wide: 38,470
Downloads over time
Distribution of downloads per paper, site-wide
- 27 Nov 2020: The website and API now include results pulled from medRxiv as well as bioRxiv.
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!