Rxivist logo

Large-Scale Uniform Analysis of Cancer Whole Genomes in Multiple Computing Environments

By Christina K. Yung, Brian D. O’Connor, Sergei Yakneen, Junjun Zhang, Kyle Ellrott, Kortine Kleinheinz, Naoki Miyoshi, Keiran M. Raine, Romina Royo, Gordon B. Saksena, Matthias Schlesner, Solomon I. Shorser, Miguel Vazquez, Joachim Weischenfeldt, Denis Yuen, Adam P Butler, Brandi N. Davis-Dusenbery, Roland Eils, Vincent Ferretti, Robert L. Grossman, Olivier Harismendy, Youngwook Kim, Hidewaki Nakagawa, Steven J. Newhouse, David Torrents, Lincoln D Stein, on behalf of the PCAWG Technical Working Group, Javier Bartolomé Rodriguez, Keith A Boroevich, Rich Boyce, Angela N. Brooks, Alex Buchanan, Ivo Buchhalter, Niall J. Byrne, Andy Cafferkey, Peter J. Campbell, Zhaohong Chen, Sunghoon Cho, Wan Choi, Peter Clapham, Francisco M. De La Vega, Jonas Demeulemeester, Michelle T. Dow, Lewis J. Dursi, Juergen Eils, Claudiu Farcas, Francesco Favero, Nodirjon Fayzullaev, Paul Flicek, Nuno A. Fonseca, Josep Ll Gelpi, Gad A. Getz, Bob Gibson, Michael C. Heinold, Julian M. Hess, Oliver Hofmann, Jongwhi H. Hong, Thomas J. Hudson, Daniel Huebschmann, Barbara Hutter, Carolyn M. Hutter, Seiya Imoto, Sinisa Ivkovic, Seung-Hyup Jeon, Wei Jiao, Jongsun Jung, Rolf Kabbe, Andre Kahles, Jules Kerssemakers, Hyunghwan Kim, Hyung-Lae Kim, Jihoon Kim, Jan O. Korbel, Michael Koscher, Antonios Koures, Milena Kovacevic, Chris Lawerenz, Ignaty Leshchiner, Dimitri G. Livitz, George L. Mihaiescu, Sanja Mijalkovic, Ana Mijalkovic Lazic, Satoru Miyano, Hardeep K. Nahal, Mia Nastic, Jonathan Nicholson, David Ocana, Kazuhiro Ohi, Lucila Ohno-Machado, Larsson Omberg, B.F. Francis Ouellette, Nagarajan Paramasivam, Marc D. Perry, Todd D. Pihl, Manuel Prinz, Montserrat Puiggròs, Petar Radovic, Esther Rheinbay, Mara W. Rosenberg, Charles Short, Heidi J. Sofia, Jonathan Spring, Adam J Struck, Grace Tiao, Nebojsa Tijanic, Peter Van Loo, David Vicente, Jeremiah A. Wala, Zhining Wang, Johannes Werner, Ashley Williams, Youngchoon Woo, Adam J. Wright, Qian Xiang, the PCAWG Network

Posted 10 Jul 2017
bioRxiv DOI: 10.1101/161638

The International Cancer Genome Consortium (ICGC)'s Pan-Cancer Analysis of Whole Genomes (PCAWG) project aimed to categorize somatic and germline variations in both coding and non-coding regions in over 2,800 cancer patients. To provide this dataset to the research working groups for downstream analysis, the PCAWG Technical Working Group marshalled ~800TB of sequencing data from distributed geographical locations; developed portable software for uniform alignment, variant calling, artifact filtering and variant merging; performed the analysis in a geographically and technologically disparate collection of compute environments; and disseminated high-quality validated consensus variants to the working groups. The PCAWG dataset has been mirrored to multiple repositories and can be located using the ICGC Data Portal. The PCAWG workflows are also available as Docker images through Dockstore enabling researchers to replicate our analysis on their own data.

Download data

  • Downloaded 2,227 times
  • Download rankings, all-time:
    • Site-wide: 3,544 out of 100,957
    • In genomics: 627 out of 6,258
  • Year to date:
    • Site-wide: 21,133 out of 100,957
  • Since beginning of last month:
    • Site-wide: 29,471 out of 100,957

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide


Sign up for the Rxivist weekly newsletter! (Click here for more details.)


  • 20 Oct 2020: Support for sorting preprints using Twitter activity has been removed, at least temporarily, until a new source of social media activity data becomes available.
  • 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
  • 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
  • 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
  • 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
  • 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
  • 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
  • 22 Jan 2019: Nature just published an article about Rxivist and our data.
  • 13 Jan 2019: The Rxivist preprint is live!