Combining accurate tumour genome simulation with crowd-sourcing to benchmark somatic structural variant detection
Anna Y. Lee,
Adam D Ewing,
Kathleen E. Houlahan,
Shadrielle Melijah G. Espiritu,
Takafumi N. Yamaguchi,
ICGC-TCGA DREAM Somatic Mutation Calling Challenge Participants,
Michael R. Kellen,
Thea C. Norman,
Stephen H. Friend,
Adam A. Margolin,
Paul C. Boutros
Posted 25 Nov 2017
bioRxiv DOI: 10.1101/224733 (published DOI: 10.1186/s13059-018-1539-5)
Posted 25 Nov 2017
Background: The phenotypes of cancer cells are driven in part by somatic structural variants (SVs). SVs can initiate tumours, enhance their aggressiveness and provide unique therapeutic opportunities. Whole-genome sequencing of tumours can allow exhaustive identification of the specific SVs present in an individual cancer, facilitating both clinical diagnostics and the discovery of novel mutagenic mechanisms. A plethora of somatic SV detection algorithms have been created to enable these discoveries, however there are no systematic benchmarks of them. Rigorous performance evaluation of somatic SV detection methods has been challenged by the lack of gold-standards, extensive resource requirements and difficulties in sharing personal genomic information. Results: To facilitate SV detection algorithm evaluations, we created a robust simulation framework for somatic SVs by extending the BAMSurgeon algorithm. We then organized and enabled a crowd-sourced benchmarking within the ICGC-TCGA DREAM Somatic Mutation Calling Challenge (SMC-DNA). We report here the results of SV benchmarking on three different tumours, comprising 204 submissions from 15 teams. In addition to ranking methods, we identify characteristic error-profiles of individual algorithms and general trends across them. Surprisingly, we find that ensembles of analysis pipelines do not always outperform the best individual method, indicating a need for developing new ways to aggregate somatic SV detection approaches. Conclusions: The synthetic tumours and somatic SV detection leaderboards remain available as a community benchmarking resource, and BAMSurgeon is available at https://github.com/adamewing/bamsurgeon.
- Downloaded 913 times
- Download rankings, all-time:
- Site-wide: 21,142
- In bioinformatics: 2,583
- Year to date:
- Site-wide: 91,006
- Since beginning of last month:
- Site-wide: 98,098
Downloads over time
Distribution of downloads per paper, site-wide
- 27 Nov 2020: The website and API now include results pulled from medRxiv as well as bioRxiv.
- 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!